The initial outbreak of COVID-19 was reported in December 2019, China. The pandemic has led to unforeseen challenges, causing unimaginable devastation of the economic and social disruption since its inception. An effective approach for forecasting infections will be beneficial for the health sector and administration in better strategic planning and proficient management of all necessary schemes towards preventive and curative treatments. Most existing studies consider image dataset for COVID-19 prediction, whereas studies involving structural data are very rare. Thus, initially the main focus of this paper is to provide an exhaustive review that discusses about COVID-19 forecasting papers with emphasis on structural data. Then, this paper introduces a pioneering approach to COVID-19 infection forecasting, utilizing structural datasets instead of traditional image datasets. It presents a novel multi-source transfer-learning framework to enhance prediction accuracy, integrating demographic, economic, and COVID-19 data for intra-provincial spread forecasts. The COVID-19 forecasting depends on several parameters such as its current statistics, geographical area, population density and economic status like GDP etc. However, the dataset generated for an individual province of a country is alone inadequate for the precise forecast, as it faces data scarcity. Thus, transfer learning helps in such cases, where the dataset has been collected from multiple provinces. Since, it is a time-series data, thus we also consider lagged features for efficient prediction of COVID cases. Thus, apart from the detailed review, this study also aims to develop robust machine learning models by proposing a novel and efficient multi-source transfer learning technique for accurate forecasting of COVID-19 in a province. The proposed approach has been evaluated over a wide range of datasets involving sixty-two different provinces belonging to a diverse set of countries. We also performed hyperparameter tuning using Bayesian optimisation to optimise the machine learning models used. Later, we performed Friedman and Nemenyi test to compare the results generated from different models. Empirical evidence proved that forecasting using the proposed approach is much more precise with the simpler models such as Decision Trees as compared to complex models. In cases of data scarcity, when target domain data could not be used for training/fine-tuning the models simpler models are far more powerful due to their generalization capabilities than complex models. Hence, the proposed methodology is promising and valuable for governments and organizations to deal with the challenges of any pandemic outbreak for better healthcare planning and management, even when the data is in scarcity.
Read full abstract