India’s COVID-19 fight — A brief Data Analysis — I

Sreshta Putchala
4 min readOct 4, 2020

This is the data of Covid19 cases in India. In the first part, I tried to understand the data distribution of various features — Daily Case Count, Recoveries, and Fatalities. The aim is to determine the distributions of these factors before modeling for predictions. The data considered is from March 12th to September 23rd; sourced from www. covid19india.org. I have not run the predictions on this dataset because, in the last couple of weeks, the cases seem to come down. The trajectory and underlying dynamics seem to have changed. If the downward trajectory continues for another 14 days, I will remodel the data for projections.

The preliminary time series data shows the distribution is not yet showing either a normal or log-normal behavior of a typical pandemic curve — showing India’s fight is not over yet. The Data also is showing a constant uptick in the cases until the second week of September.

Daily Cases and Recoveries with trend lines
Daily fatalities with Trend line

Although I tried to fit the data with a Linear model, since the intercept was negative, it makes little sense to use the linear model. However, I have tried generalized linear models (glm) with linear, quadratic, and polynomial models. I have used the same glm library to fit the exponential model.

Model characteristics of Daily cases reported
Model characteristics of Daily recoveries reported

The Quadratic model seems to fit well according to its F-statistic although the residual values of the exponential model are low.

Daily case volume distributions and model fit

I could have used various parameter estimation methods like Maximum Likelihood Estimate, moment matching estimation, quantile matching estimation, or goodness-of-fit estimation; but for the distribution of the data seems to be more open-ended amenable for regression. This also shows that the Covid19’s first wave is still in play in India.

Since, Daily Count, Daily recovered and Active cases are highly correlated, I have shown fit of Daily active cases; as it is reflecting the Daily count (with .98 correlation). The other correlations are:

Cases and Recovered: 0.98
Cases and Deceased: 0.93
Recovered and Deceased: 0.87

Model characteristics of Daily fatalities

The “Deceased” is slightly less correlated they tie to either the daily case volume or recoveries showing the active cases and perhaps have a time lag associated with it. I calculated the correlation after removing the outlier that occurred on day 110.

The following graphs are for Daily fatalities. The graphs for the recoveries have a similar distribution.

Daily fatalities and model fit

We would have expected that the fatalities curve would follow quadratic too, but, the algorithm may not have found the curvature yet; probably for the scale and quantity of observations. Since the counts are so low, when compared to the cases and recoveries (just 1–2%), unless we have a lot more data points, the data may not fit the quadratic or exponential distributions. Since the fatalities volume is so low, the quadratic trend can't be noticed with the available data points! However, we can conclude that the behavior of Case volume and recoveries differ from the fatalities.

I ran and tried to fit the models for the following ratios: Deceased/ Reported Cases ( also called Case fatality rates — cfr), Deceased/Recoveries and Deceased/Active Cases.

Fatality rations w.r.t Case volumes, Active Cases and recoveries

We notice 2 inflection points — one around data 100 and one in the 3rd week of September, the behavior of the Pandemic appears to have changed its behavior. I will explore state-level distributions in the next few days to identify the underlying factors. I will dig a little deeper in the next analyses by considering the #tests administered and a detailed state-wise analyses in the coming series in this blog.

The following analysis is going to be covered in subsequent blogs:
1. State wise Analyses 2. Analysis including testing volumes 3. Predicting the case volumes and fatalities 4. Effectiveness of States (Scoring and ranking)

The code and a complete analysis are in my github repository. https://github.com/5re5htaRushya/covid_19_projections

--

--