Abstract

According to the World Health Organization (WHO), vector-borne diseases such as malaria and dengue account for 17% of all infectious disease cases and lead to more than 700,000 deaths per year. Tracking and predicting the spread of vector-borne diseases is a vital task that could save hundreds of thousands of lives annually. Oftentimes, the first reports of vector-borne disease outbreaks occur through emails and online reporting systems long before they are officially documented. Tracking and predicting the emergence and spread of vector-borne disease outbreaks requires extracting data from these unstructured sources in combination with historical weather and climate data to understand the underlying background triggers and disease dynamics. In this work, we develop a data extraction pipeline for the online outbreak reporting website ProMED-mail that utilizes a web scraper, transformer neural network summarizer, and named entity recognizer to obtain a dataset of malaria, dengue, zika, and chikungunya outbreaks over the last 30 years. This scraped dataset was further analyzed in association with global rainfall anomalies derived from NASA’s Integrated Multi-satellitE Retrievals for GPM [Global Precipitation Mission] (IMERG) dataset. This preliminary analysis was to understand the effect of global rainfall patterns on the spread of vector-borne diseases. Analysis of the ProMED-mail and GPM data shows that vector-borne disease outbreaks are clustered towards the tropics and outbreaks are often amplified during the rainy seasons. Our scraped dataset can be a valuable tool in creating comprehensive georeferenced disease records for modeling and predicting future outbreaks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call