Abstract Anticancer therapy changes tumor physiology and genomics, making it a key variable in cancer studies. Although antineoplastics given at a single institution may be available in research-ready format, treatment at external institutions prior to receiving care at academic medical centers, common among patients at these centers, is often only described in free-text clinical notes, necessitating manual curation for downstream analysis. To overcome this bottleneck, we trained and validated natural language processing (NLP) models using initial consult notes to identify whether patients had received treatment at external institutions and studied the impact of these putative treatments on tumor genomics. Training data were derived from the AACR Project GENIE Biopharma Collaborative (BPC) for 2,663 patients at Memorial Sloan Kettering (MSK) across four cancer types. For each patient, we selected initial visits with medical and radiation oncologists based on an a priori note prioritization scheme and determined “ground-truth” prior external medications based on manually curated BPC administration records, whitelisting MSK-given medications. We trained logistic regression and clinical longformer models to identify external treatment receipt and evaluated model performance with 5-fold cross-validation. The clinical longformer model performed best across evaluation metrics, with an average area under the receiver operating characteristic curve of 0.972, macro-averaged precision/recall of 0.854/0.902 and macro-averaged F1 score of 0.876. Re-review of discrepant cases suggested that 75% of “false positives” may be due to curation error. We used our model to infer treatment status in a pan-cancer cohort with tumor genomic profiling using our institutional sequencing platform. Out of 48,447 patients, 11,900 were predicted to have received external treatment. Patients with putative external treatment had higher alteration frequencies in resistance-related genes than untreated patients and comparable to known pre-treated patients, including ESR1 in patients with breast cancer, AR in patients with prostate cancer, and EGFR T790M in patients with EGFR-mutated non-small cell lung cancer. Patients with putative external treatments, similar to known pre-treated patients, had shorter survival compared to treatment-naïve patients of the same cancer type. NLP can abstract external treatment status from clinical notes. When applied at scale, our model could help mitigate confounding variables and identify relationships between clinicogenomic variables and anticancer therapy. Citation Format: Thinh N. Tran, Karl B. Pichotta, Si-Yang Liu, Christopher Fong, Anisha Luthra, Brooke Mastrogiacomo, Steven Maron, Deborah Schrag, Sohrab P. Shah, Pedram Razavi, Bob T. Li, Gregory J. Riely, Nikolaus Schultz, Justin Jee. Identification of anti-neoplastic therapy given before initial visit at a referral center using natural language processing applied to medical oncology initial consultation notes. [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 1 (Regular and Invited Abstracts); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(7_Suppl):Abstract nr 4259.
Read full abstract