This article proposes a machine learning augmented technique to predict the coronavirus disease (COVID-19) outbreak in India by combining Internet search trends along with social media data retrieved from Twitter. A comprehensive list of suitable search words has been used to select a large collection of Tweets, and the Internet search trends of the same keywords have been fetched. First, a lag correlation analysis is conducted to find the number of days, ahead of the current time, required to make an accurate prediction of COVID-19 cases. Second, both shallow and deep learning methods are engaged to predict the number of COVID-19 cases in a specific geospatial location in India. Thereafter, statewise air pollution data collected from the Central Pollution Control Board, Government of India, are amalgamated to understand the effect of air pollution in spreading of COVID-19 disease. The air pollution monitoring parameters have been combined to understand their effects in the prediction of COVID-19 cases in the Indian context. Experimental results reveal that accurate predictions can be made 85 days ahead of the current time using the proposed method (r > 0.85), thereby establishing its ingenuity in the prediction of COVID-19 spread in advance.
Read full abstract