Abstract

Abstract Introduction: Natural language processing has recently achieved unprecedented performances for several medical tasks, but requires additional improvements in oncology. Moreover, very few projects have assessed the potential of language models to help prevent the most frequent serious or severe medical events in medical oncology. We aim at predicting nausea or vomiting (ICD10 code R11), and fatigue or malaise (ICD10 code R53) from patients' medical reports. Material and methods: The study included all the patients of Centre Léon Bérard between 2000 and 2023 that have not refused to share their data for this analysis. We have retrieved all the clinical notes and manual coding of hospitalization stays in ICD10, in French. We have pretrained a BERT language model with a masking strategy on this data and then fine-tuned it and compared it to several medical pretrained open source models (DrBert and K-memBERT). The labels were medical events leading to or associated with a hospitalization in the 90 days after every patients’ notes. For OncoBERT, we included sequential reports from the patients’ history, along with a time-encoding layer, and integrated it in a final transformer layer. Results: We analyzed 140,523 patients, representing 2,515,957 pseudo-anonymized text reports and 6.6M hospitalizations codes in total. The medical texts were consultations reports (56%), end-of-stay reports (17%) and hospitalization summaries (9%). The most frequent types of oncology treatments received by the patients at each time point were 18.8% for chemotherapy, 10.8% for targeted therapies and 1.3% for immunotherapies. The most frequent medical events were nausea or vomiting (20% of patients with 1 or more events), and fatigue or malaise (18% of patients with 1 or more events). In the final dataset, nausea and vomiting (R11) accounted for 16% of the labels while malaise and fatigue (R53) accounted for 24.7% of the labels. We performed random undersampling of reports without any event to balance the label dataset. The fine-tuning on R11 and R53 achieved the performances of 0.58 macro-aucpr (OncoBERT) and 0.50 macro-aucpr (best open-source model) on the validation set. Conclusion: The language models achieved high performance on the prediction of the most frequent serious medical events in our hospital dedicated to cancer care. We plan to validate the external performances of these models on collaborating hospitals and prospectively and improve the interpretations that we will present at the congress. Citation Format: Raphael Vienne, Quentin Filori, Vincent Susplugas, Hugo Crochet, Loic Verlingue. Prediction of nausea or vomiting, and fatigue or malaise in cancer care [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 3475.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call