We developed a United States-based real-world data resource to better understand the continued impact of the coronavirus disease 2019 (COVID-19) pandemic on immunocompromised patients, who are typically underrepresented in prospective studies and clinical trials. The COVID-19 Real World Data infrastructure (CRWDi) was created by linking and harmonizing de-identified HealthVerity medical and pharmacy claims data from 1 December 2018 to 31 December 2023, with severe acute respiratory syndrome coronavirus 2 virologic and serologic laboratory data from major commercial laboratories and Northwell Health; COVID-19 vaccination data; and, for patients with cancer, 2010 to 2021 National Cancer Institute Surveillance, Epidemiology, and End Results registry data. The CRWDi contains 4 cohorts: patients with cancer; patients with rheumatic diseases receiving pharmacotherapy; noncancer solid organ and hematopoietic stem cell transplant recipients; and people from the general population including adults and pediatric patients. The project successfully linked and harmonized longitudinal, de-identified data on 5.2 million unique patients using privacy-preserving record lineage techniques. The system was developed in early 2024 and rapidly deployed, enabling longitudinal analysis of patient healthcare over the full geography of delivery settings and exploration of novel questions for populations at high risk for adverse outcomes. The successful development of the CRWDi enables researchers to address unanswered questions that have arisen during the COVID-19 pandemic. By making the data broadly and freely available to academic researchers, this real-world data system represents an important complement to existing consortia and clinical trials that have emerged during the healthcare crisis and is readily reproducible for future purposing.
Read full abstract