Sentiment analysis of tweets on prior authorization.

Syed Hussaini,Emil Lou,Muhammad Shaalan Beg,Suveen Angraal,Praveen Rao,Shivika Prasanna,Pamela T Johnson,Arjun Gupta,Helen Parsons,Ishwaria Mohan Subbiah,Rohan Khera,Ramy Sedhom,Naveen Premnath

doi:10.1200/jco.2020.39.28_suppl.322

Abstract

322 Background: Natural language processing (NLP) algorithms can be leveraged to better understand prevailing themes in healthcare conversations. Sentiment analysis, an NLP technique to analyze and interpret sentiments from text, has been validated on Twitter in tracking natural disasters and disease outbreaks. To establish its role in healthcare discourse, we sought to explore the feasibility and accuracy of sentiment analysis on Twitter posts (‘’tweets’’) related to prior authorizations (PAs), a common occurrence in oncology built to curb payer-concerns about costs of cancer care, but which can obstruct timely and appropriate care and increase administrative burden and clinician frustration. Methods: We identified tweets related to PAs between 03/09/2021-04/29/2021 using pre-specified keywords [e.g., #priorauth etc.] and used Twarc, a command-line tool and Python library for archiving Twitter JavaScript Object Notation data. We performed sentiment analysis using two NLP models: (1) TextBlob (trained on movie reviews); and (2) VADER (trained on social media). These models provide results as polarity, a score between 0-1, and a sentiment as ‘’positive’’ (>0), ‘’neutral’’ (exactly 0), or ‘’negative’’ (<0). We (AG, NP) manually reviewed all tweets to give the ground truth (human interpretation of reality) including a notation for sarcasm since models are not trained to detect sarcasm. We calculated the precision (positive predictive value), recall (sensitivity), and the F1-Score (measure of accuracy, range 0-1, 0=failure, 1=perfect) for the models vs. the ground truth. Results: After preprocessing, 964 tweets (mean 137/ week) met our inclusion criteria for sentiment analysis. The two existing NLP models labeled 42.4%- 43.3% tweets as positive, as compared to the ground truth (5.6% tweets positive). F-1 scores of models across labels ranged from 0.18-0.54. We noted sarcasm in 2.8% of tweets. Detailed results in Table. Conclusions: We demonstrate the feasibility of performing sentiment analysis on a topic of high interest within clinical oncology and the deficiency of existing NLP models to capture sentiment within oncologic Twitter discourse. Ongoing iterations of this work further train these models through better identification of the tweeter (patient vs. health care worker) and other analytics from shared content.[Table: see text]

Full Text