Exploring Socioeconomic Status as a Global Determinant of COVID-19 Prevalence, Using Exploratory Data Analytic and Supervised Machine Learning Techniques: Algorithm Development and Validation Study.

Luke Winston,Michael Mccann,George Onofrei

doi:10.2196/35114

Abstract

BackgroundThe COVID-19 pandemic represents the most unprecedented global challenge in recent times. As the global community attempts to manage the pandemic in the long term, it is pivotal to understand what factors drive prevalence rates and to predict the future trajectory of the virus.ObjectiveThis study had 2 objectives. First, it tested the statistical relationship between socioeconomic status and COVID-19 prevalence. Second, it used machine learning techniques to predict cumulative COVID-19 cases in a multicountry sample of 182 countries. Taken together, these objectives will shed light on socioeconomic status as a global risk factor of the COVID-19 pandemic.MethodsThis research used exploratory data analysis and supervised machine learning methods. Exploratory analysis included variable distribution, variable correlations, and outlier detection. Following this, the following 3 supervised regression techniques were applied: linear regression, random forest, and adaptive boosting (AdaBoost). Results were evaluated using k-fold cross-validation and subsequently compared to analyze algorithmic suitability. The analysis involved 2 models. First, the algorithms were trained to predict 2021 COVID-19 prevalence using only 2020 reported case data. Following this, socioeconomic indicators were added as features and the algorithms were trained again. The Human Development Index (HDI) metrics of life expectancy, mean years of schooling, expected years of schooling, and gross national income were used to approximate socioeconomic status.ResultsAll variables correlated positively with the 2021 COVID-19 prevalence, with R2 values ranging from 0.55 to 0.85. Using socioeconomic indicators, COVID-19 prevalence was predicted with a reasonable degree of accuracy. Using 2020 reported case rates as a lone predictor to predict 2021 prevalence rates, the average predictive accuracy of the algorithms was low (R2=0.543). When socioeconomic indicators were added alongside 2020 prevalence rates as features, the average predictive performance improved considerably (R2=0.721) and all error statistics decreased. Thus, adding socioeconomic indicators alongside 2020 reported case data optimized the prediction of COVID-19 prevalence to a considerable degree. Linear regression was the strongest learner with R2=0.693 on the first model and R2=0.763 on the second model, followed by random forest (0.481 and 0.722) and AdaBoost (0.454 and 0.679). Following this, the second model was retrained using a selection of additional COVID-19 risk factors (population density, median age, and vaccination uptake) instead of the HDI metrics. However, average accuracy dropped to 0.649, which highlights the value of socioeconomic status as a predictor of COVID-19 cases in the chosen sample.ConclusionsThe results show that socioeconomic status is an important variable to consider in future epidemiological modeling, and highlights the reality of the COVID-19 pandemic as a social phenomenon and a health care phenomenon. This paper also puts forward new considerations about the application of statistical and machine learning techniques to understand and combat the COVID-19 pandemic.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: JMIR formative research	Publication Date: Sep 27, 2022
Citations: 2	License type: cc-by

R Discovery Prime

R Discovery Prime

Exploring Socioeconomic Status as a Global Determinant of COVID-19 Prevalence, Using Exploratory Data Analytic and Supervised Machine Learning Techniques: Algorithm Development and Validation Study.

Abstract

Talk to us

Similar Papers

More From: JMIR formative research

Lead the way for us

Similar Papers

A Simple Measure of Human Development: The Human Life Indicator.
Simone Ghislandi ... Sergei Scherbov
Population and Development Review | VOL. 45
Simone Ghislandi, et. al.Simone Ghislandi ... Sergei Scherbov
06 Nov 2018
Population and Development Review | VOL. 45

Letter to the Editor: Influence of Altitude on the Prevalence and Case Fatality Rate of COVID-19 in Peru
Claudio Intimayta-Escalante ... Daniel Rojas-Bolivar
High Altitude Medicine & Biology | VOL. 21
Claudio Intimayta-Escalante, et. al.Claudio Intimayta-Escalante ... Daniel Rojas-Bolivar
14 Aug 2020
High Altitude Medicine & Biology | VOL. 21

GLOBAL Leukemia in Children 0-14 Statistics 2018, Incidence and Mortality and Human Development Index (HDI): GLOBOCAN Sources and Methods.
Seyedeh Mahdieh Namayandeh ... Alireza Moslem
Asian Pacific Journal of Cancer Prevention | VOL. 21
Seyedeh Mahdieh Namayandeh, et. al.Seyedeh Mahdieh Namayandeh ... Alireza Moslem
01 May 2020
Asian Pacific Journal of Cancer Prevention | VOL. 21

National HIV/AIDS mortality, prevalence, and incidence rates are associated with the Human Development Index
Li-Xia Lou ... Juan Ye
American Journal of Infection Control | VOL. 42
Li-Xia Lou, et. al.Li-Xia Lou ... Juan Ye
29 Sep 2014
American Journal of Infection Control | VOL. 42

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Exploring Socioeconomic Status as a Global Determinant of COVID-19 Prevalence, Using Exploratory Data Analytic and Supervised Machine Learning Techniques: Algorithm Development and Validation Study.

Abstract

Talk to us

Similar Papers

More From: JMIR formative research