Abstract

Data quality significantly impacts the reproducibility and validity of clinical research. The All of Us Research Program aims to collect biomedical data from one million or more participants, with the goal of enabling discoveries and improving targeted management of diseases like type 2 diabetes (T2D). All of Us data include whole genome sequences (WGS), electronic health records (EHRs), device measurements of physical activity, physical measures, and surveys available for analysis on the Researcher Workbench. The program's Spring 2023 curated data release makes All of Us the world's largest, most diverse genomic dataset of its kind available for broad research use, including 413,457 participants among which 60.4% are female, 44.6% are non-white. To ensure maximum utility of this dataset for T2D researchers, we set out to quantify the data quality among participants who have T2D diagnosis codes in their EHRs. We identified the measurements and laboratory tests (hemoglobin A1C, height, weight, body mass index), medications (Insulin and non-insulin), and procedures commonly used in T2D research. We quantified data fitness using five dimensions of quality: completeness, concordance (i.e., agreement), conformance to data standards, plausibility, and temporality. Among 287,012 participants who shared EHRs, 40,093 (14%) had T2D diagnosis codes where the mean age at first diagnosis was 56. Regarding measurements, 68% had A1C, 89% had height, and 70% had weight recorded in their EHR. For plausibility, 99.7% of weights, 99.9% of heights, and 99.3% of BMI were valid, and 97% of heights and weights have high concordance values. For medications, 99.7% of T2D participants had T2D medications, and 47.6% were prescribed after T2D diagnosis. Of these T2D participants, 69% have WGS or array data available. Our analysis shows that the All of Us dataset offers a valuable and high-quality dataset for T2D phenotyping and diagnosis research. Our quality report and code will be available for replication and reuse by researchers with the upcoming release. Disclosure L. Sulieman: None. J. Giannini: None. E. Dede yildirim: None. Y. Ostchega: None. E. Ochsenfaber: None. L. Berman: None. A. Ramirez: n/a.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call