Abstract

1550 Background: Real world data (RWD) is increasingly used to inform research, patient care, and population health in oncology; however, using RWD at scale requires accurate methods to identify clinically-relevant attributes. Metastatic status is a highly relevant clinical attribute in cancer patients but it is not routinely captured in structured formats and its determination conventionally requires review and interpretation by certified tumor registrars (CTRs). Clinical diagnoses, treatments, imaging procedures and other clinical variables documented in electronic health records (EHRs) can be used to differentiate metastatic from non-metastatic patients. This study describes an effective machine learning approach in utilizing prevalent and standardized data elements from EHRs across multiple health systems. Methods: 28,043 lung cancer and breast cancer patients from two large health systems within the Syapse Learning Health Network with data sources from CTR abstraction and EHRs were analyzed. Patients were labeled for reference metastatic status by CTRs and split into training (n = 22,434) and testing (n = 5,609) cohorts, with proportionate distribution of cancer type and metastatic status between cohorts. A regularized gradient boosting algorithm, XGBoost, was trained using over 750 variables from the patient records collected at the time of or after the initial cancer diagnosis. Results: Integration of ICD-10-CM codes with antineoplastic treatment history and radiologic imaging procedure orders achieved metastatic status prediction with increases to precision and recall in lung cancer (21% and 32% respectively) and breast cancer (39% and 9% respectively), when compared to the use of only ICD-10-CM diagnosis codes for secondary malignant neoplasms (Table). The addition of treatment and procedure data from different cancer types improved the model classification within individual cancer types. Conclusions: One of the biggest challenges in using RWD for precision oncology is identification of clinically-relevant phenotypes at scale. Here we demonstrate a scalable evidence-based method utilizing structured data for imputing metastatic status with high predictive power from two separate health systems. With further validation, this approach may be generalized to other cancer types, applied to temporal slices of data to identify changes in metastatic status, as well as provide a high-confidence designation of metastatic status for other use cases such as staging.[Table: see text]

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.