You are using an outdated browser. For a faster, safer browsing experience, upgrade for free today.

Loading...

4.2 Data Binning

4.2.1 Covid-19 Cases and Death Dataset

Here is the overvew of the binned data.

We chose to bin the data that combined by two columns - 'origin' and 'destination'. Which is actually the number of flights per day after the processing. As for the binning strategy we used in this part, is Equal-width binning, which is pretty good for those kind of data. Since the number of flights would be pretty stable if there is no influence.

From this chart we can know that average flights per day is within 6906 to 10358, and the overall trend is toward higher frequencies. In fact, from here we can come up with a more superficial hypothesis that people's travel and their attitudes towards the new cap and the spread of the disease have no direct relationship

4.2.2 Covid-19 Cases and Death Dataset

  • The tot_cases is the key data in the CDC data, reflecting the number of patients in a certain state on a certain day. Through the binning operation, we can get the distribution of the number of patients, we can eliminate some abnormal data, realize the discretization of the data, improve the robustness of the data, and will not collapse due to the input of some extreme size data. In addition, if we further subdivide into each state, we can get the peak date of the epidemic in each state.
  • For the tot_cases, we divide it into intervals with same width .
  • We can find that the first bin concludes half of the cases.