Abstract
e16268 Background: Pancreatic cancer (PC) has a uniquely poor survival rate due to the absence of proven and effective methods for early detection. We thus aimed to leverage recent advances in deep learning towards the task of inferring early risk of PC from longitudinal laboratory test data contained within Electronic Health Records (EHR) data. Methods: In this study, we develop a novel deep learning framework for incorporating longitudinal clinical data from EHR to infer risk for PC. This framework includes a novel training protocol, which enforces an emphasis on early detection by applying an independent Poisson-random mask on proximal-time measurements for each variable. Data fusion for irregular multivariate time-series features is enabled by a “grouped” neural network (GrpNN) architecture, which uses representation learning to generate a dimensionally reduced vector for each measurement set before generating a final prediction. These models were evaluated using EHR data from Tripartite Request Assessment Committee (TRAC). Results: Our framework demonstrated better performance on early detection (AUROC 0.671, CI 95% 0.667–0.675, p < 0.001) at 12 months prior to diagnosis compared to a logistic regression and a feedforward neural network baseline (black-box model). We demonstrate that our masking strategy results greater improvements at distal times prior to diagnosis, and that our GrpNN model improves generalizability by reducing overfitting relative to the feedforward baseline (Table). The results were consistent across reported race. Conclusions: Our study presents new approaches for integrating multimodal longitudinal clinical data with bias reduction strategies which results in improved early detection of PC. This study demonstrates for the first time the utility of multivariate time series laboratory test results for early detection of PC. Our proposed algorithm is potentially generalizable to improve risk predictions for other types of cancer and other diseases where early detection can improve survival. We split data into train set (80%) and hold-out set (20%) and presented mean AUROC and AUPRC with 95% confidence intervals.[Table: see text]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.