Abstract

ObjectivesFederal open-data initiatives that promote increased sharing of federally collected data are important for transparency, data quality, trust, and relationships with the public and state, tribal, local, and territorial partners. These initiatives advance understanding of health conditions and diseases by providing data to researchers, scientists, and policymakers for analysis, collaboration, and use outside the Centers for Disease Control and Prevention (CDC), particularly for emerging conditions such as COVID-19, for which data needs are constantly evolving. Since the beginning of the pandemic, CDC has collected person-level, de-identified data from jurisdictions and currently has more than 8 million records. We describe how CDC designed and produces 2 de-identified public datasets from these collected data.MethodsWe included data elements based on usefulness, public request, and privacy implications; we suppressed some field values to reduce the risk of re-identification and exposure of confidential information. We created datasets and verified them for privacy and confidentiality by using data management platform analytic tools and R scripts.ResultsUnrestricted data are available to the public through Data.CDC.gov, and restricted data, with additional fields, are available with a data-use agreement through a private repository on GitHub.com.Practice ImplicationsEnriched understanding of the available public data, the methods used to create these data, and the algorithms used to protect the privacy of de-identified people allow for improved data use. Automating data-generation procedures improves the volume and timeliness of sharing data.

Highlights

  • Unrestricted data are available to the public through Data.Centers for Disease Control and Prevention (CDC).gov, and restricted data, with additional fields, are available with a data-­use agreement through a private repository on GitHub.com

  • Data-s­haring initiatives are important during the COVID-19 pandemic, when data needs are constantly evolving and there is much to learn about the disease

  • To support the most users, CDC releases these data according to the FAIR Guiding Principles of findability, accessibility, interoperability, and reusability,[18] including the use of machine-­ readable comma-s­ eparated values (CSV) formats and an open-s­tandards–compliant application programming interface

Read more

Summary

Objectives

Federal open-d­ ata initiatives that promote increased sharing of federally collected data are important for transparency, data quality, trust, and relationships with the public and state, tribal, local, and territorial partners. These initiatives advance understanding of health conditions and diseases by providing data to researchers, scientists, and policymakers for analysis, collaboration, and use outside the Centers for Disease Control and Prevention (CDC), for emerging conditions such as COVID-19, for which data needs are constantly evolving. We describe how CDC designed and produces 2 de-­identified public datasets from these collected data

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call