Epi Archive: Automated Synthesis of Global Notifiable Disease Data

Hari S Kkalsa,Nicholas Generous,Sergio Rene Cordova

doi:10.5210/ojphi.v11i1.9707

Hari S Kkalsa, Nicholas Generous + Show 1 more

Open Access

https://doi.org/10.5210/ojphi.v11i1.9707

Copy DOI

Abstract

ObjectiveAutomatically collect and synthesize global notifiable disease data and make it available to humans and computers. Provide the data on the web and within the Biosurveillance Ecosystem (BSVE) as a novel data stream. These data have many applications including improving the prediction and early warning of disease events.IntroductionGovernment reporting of notifiable disease data is common and widespread, though most countries do not report in a machine-readable format. This is despite the WHO International Health Regulations stating that “[e]ach State Party shall notify WHO, by the most efficient means of communication available.” 1Data are often in the form of a file that contains text, tables and graphs summarizing weekly or monthly disease counts. This presents a problem when information is needed for more data intensive approaches to epidemiology, biosurveillance and public health. While most nations likely store incident data in a machine-readable format, governments can be hesitant to share data openly for a variety of reasons that include technical, political, economic, and motivational2.A survey conducted by LANL of notifiable disease data reporting in over fifty countries identified only a few websites that report data in a machine-readable format. The majority (>70%) produce reports as PDF files on a regular basis. The bulk of the PDF reports present data in a structured tabular format, while some report in natural language or graphical charts.The structure and format of PDF reports change often; this adds to the complexity of identifying and parsing the desired data. Not all websites publish in English, and it is common to find typos and clerical errors.LANL has developed a tool, Epi Archive, to collect global notifiable disease data automatically and continuously and make it uniform and readily accessible.MethodsA survey of the national notifiable disease reporting systems is periodically conducted notating how the data are reported and in what formats. We determined the minimal metadata that is required to contextualize incident counts properly, as well as optional metadata that is commonly found.The development of software to regularly ingest notifiable disease data and make it available involves three to four main steps: scraping, detecting, parsing and persisting.Scraping: we examine website design and determine reporting mechanisms for each country/website, as well as what varies across the reporting mechanisms. We then design and write code to automate the downloading of data for each country. We store all artifacts presented as files (PDF, XLSX, etc.) in their original form, along with appropriate metadata for parsing and data provenance.Detecting: This step is required when parsing structured non-machine-readable data, such as tabular data in PDF files. We combine the Nurminen methodology of PDF table detection with in-house heuristics to find the desired data within PDF reports3.Parsing: We determine what to extract from each dataset and parse these data into uniform data structures, correctly accommodating the variations in metadata (e.g., time interval definitions) and the various human languages.Persisting: We store the data in the Epi Archive database and make it available on the internet and through the BSVE. The data is persisted into a structured and normalized SQL database.ResultsEpi Archive currently contains national and/or subnational notifiable disease data from thirty-nine nations. When a user accesses the Epi Archive site, they are able to peruse, chart and download data by country, subregion, disease and time interval. Access to a cached version of the original artifacts (e.g. PDF files), a link to the source and additional metadata is also available through the user interface. Finally, to ensure machine-readability, the data from Epi Archive can be reached through a REST API. http://epiarchive.bsvgateway.org/ConclusionsLANL, as part of a currently funded DTRA effort, is automatically and continually collecting global notifiable disease data. While thirty-nine nations are in production, more are being brought online in the near future. These data are already being utilized and have many applications, including improving the prediction and early warning of disease events.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Online Journal of Public Health Informatics	Publication Date: May 30, 2019
Citations: 1	License type: cc-by

R Discovery Prime

R Discovery Prime

Epi Archive: Automated Synthesis of Global Notifiable Disease Data

Abstract

Talk to us

Similar Papers

More From: Online Journal of Public Health Informatics

Lead the way for us

Similar Papers

Epi Archive: Automated Synthesis of Global Notifiable Disease Data
Hari S Khalsa ... Prabhu S Khalsa
Online Journal of Public Health Informatics | VOL. 10
Hari S Khalsa, et. al.Hari S Khalsa ... Prabhu S Khalsa
22 May 2018
Online Journal of Public Health Informatics | VOL. 10

Epi Archive: automated data collection of notifiable disease data
Nicholas Generous ... James Arnold
Online Journal of Public Health Informatics | VOL. 9
Nicholas Generous, et. al.Nicholas Generous ... James Arnold
02 May 2017
Online Journal of Public Health Informatics | VOL. 9

The potential effectiveness of the WHO International Health Regulations capacity requirements on control of the COVID-19 pandemic: a cross-sectional study of 114 countries.
Martin Cs Wong ... Jeremy Yuen-Chun Teoh
Journal of the Royal Society of Medicine | VOL. 114
Martin Cs Wong, et. al.Martin Cs Wong ... Jeremy Yuen-Chun Teoh
09 Feb 2021
Journal of the Royal Society of Medicine | VOL. 114

Global Health Surveillance: Innovation and Coordination for Broad Health Impact
Ray L Ransom ... Ruth Kigozi
Online Journal of Public Health Informatics | VOL. 9
Ray L Ransom, et. al.Ray L Ransom ... Ruth Kigozi
02 May 2017
Online Journal of Public Health Informatics | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Epi Archive: Automated Synthesis of Global Notifiable Disease Data

Abstract

Talk to us

Similar Papers

More From: Online Journal of Public Health Informatics