Abstract

BackgroundSuicide is a serious public health issue, accounting for 1.4% of all deaths worldwide. Current risk assessment tools are reported as performing little better than chance in predicting suicide. New methods for studying dynamic features in electronic health records (EHRs) are being increasingly explored. One avenue of research involves using sentiment analysis to examine clinicians’ subjective judgments when reporting on patients. Several recent studies have used general-purpose sentiment analysis tools to automatically identify negative and positive words within EHRs to test correlations between sentiment extracted from the texts and specific medical outcomes (eg, risk of suicide or in-hospital mortality). However, little attention has been paid to analyzing the specific words identified by general-purpose sentiment lexicons when applied to EHR corpora.ObjectiveThis study aims to quantitatively and qualitatively evaluate the coverage of six general-purpose sentiment lexicons against a corpus of EHR texts to ascertain the extent to which such lexical resources are fit for use in suicide risk assessment.MethodsThe data for this study were a corpus of 198,451 EHR texts made up of two subcorpora drawn from a 1:4 case-control study comparing clinical notes written over the period leading up to a suicide attempt (cases, n=2913) with those not preceding such an attempt (controls, n=14,727). We calculated word frequency distributions within each subcorpus to identify representative keywords for both the case and control subcorpora. We quantified the relative coverage of the 6 lexicons with respect to this list of representative keywords in terms of weighted precision, recall, and F score.ResultsThe six lexicons achieved reasonable precision (0.53-0.68) but very low recall (0.04-0.36). Many of the most representative keywords in the suicide-related (case) subcorpus were not identified by any of the lexicons. The sentiment-bearing status of these keywords for this use case is thus doubtful.ConclusionsOur findings indicate that these 6 sentiment lexicons are not optimal for use in suicide risk assessment. We propose a set of guidelines for the creation of more suitable lexical resources for distinguishing suicide-related from non–suicide-related EHR texts.

Highlights

  • The World Health Organization reports that suicide accounts for 1.4% of all deaths globally and is the 18th leading cause of death worldwide [1]

  • The data for this study were a corpus of 198,451 electronic health records (EHRs) texts made up of two subcorpora drawn from a 1:4 case-control study comparing clinical notes written over the period leading up to a suicide attempt with those not preceding such an attempt

  • Our findings indicate that these 6 sentiment lexicons are not optimal for use in suicide risk assessment

Read more

Summary

Introduction

Background The World Health Organization reports that suicide accounts for 1.4% of all deaths globally and is the 18th leading cause of death worldwide [1]. Current methods for assessing a patient’s risk of attempting suicide are reported to perform little better https://medinform.jmir.org/2021/4/e22397 XSLFO RenderX. New methods to understand dynamic features from electronic health records (EHRs) before a hospitalized suicide attempt, distinguishing such periods from clinical narratives at other times, would be of potential clinical utility [4]. Current risk assessment tools are reported as performing little better than chance in predicting suicide. New methods for studying dynamic features in electronic health records (EHRs) are being increasingly explored. Several recent studies have used general-purpose sentiment analysis tools to automatically identify negative and positive words within EHRs to test correlations between sentiment extracted from the texts and specific medical outcomes (eg, risk of suicide or in-hospital mortality). Little attention has been paid to analyzing the specific words identified by general-purpose sentiment lexicons when applied to EHR corpora

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call