Natural language processing (NLP) and machine learning (ML) model for predicting CMS OP-35 categories among patients receiving chemotherapy.

Anshul Saxena,Paul Lindeman,Peter Mcgranaghan,Joseph Salami,Amanda Lindeman,Raees Tonse,Michelle Keller,Emir Veledar,Muni Rubens

doi:10.1200/jco.2021.39.15_suppl.e13591

Abstract

e13591 Background: The Hospital Outpatient Quality Reporting Program is a pay-for-quality data reporting program implemented by the Centers for Medicare & Medicaid Services (CMS). Hospitals collect data on various measures of the quality of care provided in outpatient settings for the CMS. One such measure is OP-35, where data about patients who received chemotherapy in outpatient settings are collected. Such quality measures help hospitals assess their performance and allow patients to compare the quality of care among different hospitals in that region. Currently, the process to label data for OP-35 categories is manual. This study aims to develop a model using NLP and ML to predict the ten OP-35 complication categories and automate the process. Methods: Data from 1000 adult cancer patients who received chemotherapy at a comprehensive cancer center in the South Florida region between Sept and Oct 2019 were extracted to train the ML models. Text from the Chief Complaint field was manually labeled into ten binary categories: anemia, nausea, dehydration, neutropenia, diarrhea, emesis, pneumonia, fever, sepsis, and pain. The data were divided into a training set (80%) and a test set (20%). After initial pre-processing of the text, term frequency–inverse document frequency (TF-IDF) feature extraction method with a vocabulary size of 10,000 was applied. Various models (stochastic gradient descent, support vector classification [SVC], and binary relevance, etc.) were trained to predict multiple labels. These models were evaluated using Jaccard score, accuracy, F1 score, and Hamming loss. Additionally, two deep learning approaches: a single dense output layer and multiple dense output layer models, were also used for comparison. Python version 3.8 was utilized for the analysis. Results: The best performing model was SVC, with a Jaccard score of 85.13 and 90% accuracy. In the first deep learning approach, a single dense output layer was used with multiple neurons where each neuron represented only one label. In the second approach, a separate dense layer for each label was created with one neuron. The model with a single output layer produced an accuracy score of 32%, and the model with multiple output layer had an accuracy score of 31%. Both deep learning models with single and multiple output layers did not perform well compared to SVC. Conclusions: Our study shows an early indication regarding the feasibility of modern ML techniques in predicting multiple label categories or outcomes. As a potential clinical decision support system, this model could replace manual data entry, minimize human error, and decrease resources for data collection. In the next stage, healthcare providers will validate this model by manually checking the predicted labels. In the final stage, model will be deployed in real-time to predict OP-35 categories automatically.

Full Text