The All of Us Research Program (AoU) is an initiative designed to gather a comprehensive and diverse dataset from at least one million individuals across the USA. This longitudinal cohort study aims to advance research by providing a rich resource of genetic and phenotypic information, enabling powerful studies on the epidemiology and genetics of human diseases. One critical challenge to maximizing its use is the development of accurate algorithms that can efficiently and accurately identify well-defined disease and disease-free participants for case-control studies. This study aimed to develop and validate type 1 (T1D) and type 2 diabetes (T2D) algorithms in the AoU cohort, using electronic health record (EHR) and survey data. Building on existing algorithms and using diagnosis codes, medications, laboratory results, and survey data, we developed and implemented algorithms for identifying prevalent cases of type 1 and type 2 diabetes. The first set of algorithms used only EHR data (EHR-only), and the second set used a combination of EHR and survey data (EHR+). A universal algorithm was also developed to identify individuals without diabetes. The performance of each algorithm was evaluated by testing its association with polygenic scores (PSs) for type 1 and type 2 diabetes. We demonstrated the feasibility and utility of using AoU EHR and survey data to employ diabetes algorithms. For T1D, the EHR-only algorithm showed a stronger association with T1D-PS compared to the EHR + algorithm (DeLong p-value = 3 × 10−5). For T2D, the EHR + algorithm outperformed both the EHR-only and the existing T2D definition provided in the AoU Phenotyping Library (DeLong p-values = 0.03 and 1 × 10−4, respectively), identifying 25.79% and 22.57% more cases, respectively, and providing an improved association with T2D PS. We provide a new validated type 1 diabetes definition and an improved type 2 diabetes definition in AoU, which are freely available for diabetes research in the AoU. These algorithms ensure consistency of diabetes definitions in the cohort, facilitating high-quality diabetes research.
Read full abstract