Improving serious illness conversations in oncology: A machine learning approach that integrates natural language processing for mortality prediction.

Prathamesh Parchure,Madhu Mazumdar,Cardinale B Smith,Min-Heng Wang,Arash Kia,Livingston Graham,Ksenia O Gorbenko,Marcos Vargas

doi:10.1200/op.2023.19.11_suppl.590

Abstract

590 Background: ML-based mortality prediction tools in oncology can optimize clinical decisions and prompt end-of-life care discussions. Patients with advanced cancer who have engaged in Goals of Care (GoC) conversations report improved quality of life and better care alignment. However, oncologists often have overly optimistic prognoses and miss timely GoC discussions. Clinical notes are a valuable source of information, but processing and extracting data from them is time-consuming and labor-intensive. To address this issue, we have developed a machine learning application that ingests clinical notes and structured data from electronic health records (EHRs) to generate a 180-day mortality risk, prompting oncologists for GoC conversations. Methods: A predictive machine learning model was developed using data from cancer patients aged 21 and above, diagnosed between January 2016 and December 2021. Data was collected from various sources, including cancer and death registry and the EHR. By analyzing structured and unstructured data from ambulatory progress notes, a clinical profile was created for each patient. The model utilized Spark-NLP for preprocessing, applying word2vec embedding and pre-trained NER models to extract information on diseases, symptoms, procedures, treatments, and medications. Feature engineering techniques were used to select the best NLP features, combined with structured data. The model was trained using 894 patients, employing Random Forest Classifier with 10-fold cross-validation, and tested on a separate set of 43,274 patients. Performance evaluation included ROC AUC, PR AUC, and F1 Score metrics. Results: After the fine tuning, the best model showed an AUC-ROC of 0.88 on the train set and 0.75 on the test set. At a threshold of 0.44, the model achieved a balanced performance with a sensitivity of 0.70 and specificity of 0.71 on the testing set. Conclusions: Our team pioneered the development of an automated multi-modality pipeline that combines unstructured real-world data with structured data, allowing for training and testing of a fusion model. This automation opens doors for scaling and dissemination, to enhance mortality prediction. Future works will involve qualitative analysis of implementation and acceptance in clinical practice.

Full Text