Country-level pandemic risk and preparedness classification based on COVID-19 data: A machine learning approach.

Jordan J Bird,Chloe M Barnes,Diego R Faria,Anikó Ekárt,Cristiano Premebida,Haralampos Hatzikirou

doi:10.1371/journal.pone.0241332

Jordan J Bird, Chloe M Barnes + Show 4 more

Open Access

https://doi.org/10.1371/journal.pone.0241332

Copy DOI

Abstract

In this work we present a three-stage Machine Learning strategy to country-level risk classification based on countries that are reporting COVID-19 information. A K% binning discretisation (K = 25) is used to create four risk groups of countries based on the risk of transmission (coronavirus cases per million population), risk of mortality (coronavirus deaths per million population), and risk of inability to test (coronavirus tests per million population). The four risk groups produced by K% binning are labelled as ‘low’, ‘medium-low’, ‘medium-high’, and ‘high’. Coronavirus-related data are then removed and the attributes for prediction of the three types of risk are given as the geopolitical and demographic data describing each country. Thus, the calculation of class label is based on coronavirus data but the input attributes are country-level information regardless of coronavirus data. The three four-class classification problems are then explored and benchmarked through leave-one-country-out cross validation to find the strongest model, producing a Stack of Gradient Boosting and Decision Tree algorithms for risk of transmission, a Stack of Support Vector Machine and Extra Trees for risk of mortality, and a Gradient Boosting algorithm for the risk of inability to test. It is noted that high risk for inability to test is often coupled with low risks for transmission and mortality, therefore the risk of inability to test should be interpreted first, before consideration is given to the predicted transmission and mortality risks. Finally, the approach is applied to more recent risk levels to data from September 2020 and weaker results are noted due to the growth of international collaboration detracting useful knowledge from country-level attributes which suggests that similar machine learning approaches are more useful prior to situations later unfolding.

Highlights

According to the Future of Humanity Institute there is a 2.05% chance that mankind will go extinct by the year 2100, through either a natural or engineered pandemic [1]
Country-level pandemic risk and preparedness classification based on COVID-19 data collected from
Stacked Generalisation (Stacking) [40] is the process of training a machine learning algorithm to interpret the predictions of an ensemble of algorithms trained upon the dataset in a Country-level pandemic risk and preparedness classification based on COVID-19 data process of meta-learning

Summary

Introduction

The virus initially spread rapidly across the globe, mortality began to rise, and countries desperately struggled to test their citizens for the virus once it became known that many infectious carriers of it show no noticeable symptoms [2,3,4]. This suggests three main risk factors to be observant of: the initial risk of transmission due to varying factors such as, for example, population density [5] and international travel [6]; the risk of mortality due to ageing populations [7] and underlying health issues [8, 9]; and the risk of a country not being able to test citizens aptly and producing possibly under-reported measures of the previous two [10]. Health service data trend models have shown to aid in classification of the virus [11, 12], vaccine design [13], estimation of cases, deaths, and recoveries [14, 15], simulating what could have happened if ‘lockdown’ was not instituted [16], and simulating behaviour of the spread of the disease by prior knowledge from other locations [17]

Methods

Results

Conclusion