Abstract Pancreatic cancer is a leading cause of cancer-related deaths worldwide with increasing incidence. Early diagnosis of pancreatic cancer is a key challenge, as the disease is typically detected at a late stage. However, patients who present with early-stage disease can be cured by a combination of surgery, chemotherapy and radiotherapy. Thus, a better understanding of the risk factors for pancreatic cancer and detection at early stages has great potential to improve patient survival and reduce overall mortality from this aggressive malignancy. Here we exploit the power of advanced machine learning (ML) technology by focusing on the time sequence of clinical events and by predicting the risk of cancer occurrence over a multi-year time interval. This investigation was initially carried out using the Danish National Patient Registry (DNPR) and data which covers 41 years (1977 to 2018) of clinical records for 8.6 million patients, of which about 40,000 had a diagnosis of pancreatic cancer. To maximize predictive information extraction from these records we tested a range of ML methods, ranging from regression methods and ML without or with time dependence to time series methods such as GRU and Transformer. We explicitly train machine learning models on the time sequence of diseases in patient clinical histories and test the ability to predict cancer occurrence in time intervals of 3 to 60 months after risk assessment. For cancer occurrence within 36 months, the performance of the best model (AUROC=0.88; OR=47.5 for 20% recall and OR=159.0 for 10% recall), substantially exceeds that of a model without time information, even when disease events within a 3 month window before cancer diagnosis are excluded from training (AUROC[3m]=0.84). Independent training and testing on the Boston dataset reaches comparable performance (AUROC=0.87, OR=112.0 for 20% recall and OR=162.4 for 10% recall). We also extract from the AI machine an estimate of the contribution to prediction of individual disease features, e.g., obesity and diabetes. These results raise the state-of-the-art level of performance of cancer risk prediction on real-world data sets and provide support for the design of future screening trials for high-risk patients. AI on real-world clinical records has the potential to shift focus from treatment of late-stage to early-stage cancer, benefiting patients by improving lifespan and quality of life. We expect further increases in prediction accuracy with the availability of data beyond disease codes, such as prescriptions, laboratory values, and images. Citation Format: Davide Placido, Bo Yuan, Jessica X. Hjaltelin, Amalie D. Haue, Piotr J. Chmura, Chen Yuan, Jihye Kim, Renato Umeton, Gregory Antell, Alexander Chowdhury, Alexandra Franz, Lauren Brais, Elizabeth Andrews, Debora S. Marks, Aviv Regev, Peter Kraft, Brian M. Wolpin, Michael Rosenthal, Søren Brunak, Chris Sander. AI predicts risk of pancreatic cancer from disease trajectories using real-world electronic health records (EHRs) from Denmark and the USA [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2022; 2022 Apr 8-13. Philadelphia (PA): AACR; Cancer Res 2022;82(12_Suppl):Abstract nr LB550.
Read full abstract