Abstract

This article presents a detailed analysis of Twitter data to inspect how information about the COVID-19 epidemics spread in US. To this purpose, the objectives are to identify the key terms and features used in the tweets, the interest in the COVID-19 topics, together with the evolution of the discussion all over US. To identify topics, the paper proposes an approach that combines peak detection and clustering techniques. Space-time features are extracted from the tweets and modeled as time series. After that, peaks are detected from the time series, and peaks of textual features are clustered based on the co-occurrence in the tweets. Each cluster obtained is then associated to a topic. Results, performed over a real-world dataset of tweets related to COVID-19 in US, show that the proposed approach is able to accurately detect several relevant topics of interest, of varying importance and character, including health status, symptoms, and pandemics implications on people living. A case study about the correlation of Twitter data with COVID-19 confirmed cases has been presented, also evaluating the feasibility of exploiting Twitter for the outbreak diffusion prediction. Results highlight a high correlation between tweets and real COVID-19 data, proving that Twitter can be considered a reliable indicator of the epidemic spreading and that data generated by user activity on social media is becoming an invaluable source for capturing and understanding epidemics outbreaks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call