816 Background: More than 44,000 patients are seen at MD Anderson Cancer Center annually. However, using the diverse, mostly unstructured, data from these patient encounters has been challenging and required manual chart reviews. In 2018, MD Anderson and Palantir Technologies (Denver, CO) began developing a unified, cloud-based, graphical user interface clinical informatics platform to extract, structure, and integrate data from the different data sources that comprise the electronic health record (EHR). Here, we describe our experience using this novel platform to apply a real-world evidence (RWE) approach to study patients with gastrointestinal (GI) malignancies. Methods: Institutional Review Board approval for retrospective chart review of patients with GI malignancies was previously obtained. The Foundry platform was used to incorporate more than 150 datasets, including structured data elements like lab values, unstructured data as the full text of clinical notes, and natural language processing (NLP) derived datasets. The datasets include unique patient identifiers to allow the merging of demographic, clinical, molecular, and outcomes information. The platform allows processing of the note text through NLP to extract non-discrete data elements into a discrete form. In addition, it continuously updates new data on daily bases, allowing the inclusion of new patients' information in an automated fashion. Results: From 2,013,048 patients with date of diagnosis ranging from 1944 to 2024, we have created datasets for colorectal adenocarcinoma (CRC, >50,000 patients, >8,000 with molecular data), pancreatic adenocarcinoma (PDAC, >13,000 patients), and appendiceal adenocarcinoma (AA, >3,000 patients). More than 50 variables have been integrated, including demographic information, stage, grade, overall survival, and molecular information. Focused manual validation of the automated extraction across the cohorts consistently demonstrated an accuracy of over 94%. Work is underway to extract additional features including DFS and PFS, sites of metastasis, and to build out additional cohorts for biliary tract, upper GI, and neuroendocrine tumors. Initial discovery efforts have already led to multiple publications including discovery of molecular causes of racial and ethnic disparities in CRC, survival impact of KRAS and co-mutations in PDAC, and prognostic utility of serum tumor markers in AA. Conclusions: Utilizing an automated, highly dynamic platform allowed integration of comprehensive datasets for multiparameter oncology data in patients with GI malignancies. This resulted in dramatic acceleration of cohort identification, outcomes analysis, and enabled utilizing a data-driven approach to guide decision making in an effort to enhance and optimize outcomes.
Read full abstract