Abstract
BackgroundWe examined the comparative performance of structured, diagnostic codes vs. natural language processing (NLP) of unstructured text for screening suicidal behavior among pregnant women in electronic medical records (EMRs).MethodsWomen aged 10–64 years with at least one diagnostic code related to pregnancy or delivery (N = 275,843) from Partners HealthCare were included as our “datamart.” Diagnostic codes related to suicidal behavior were applied to the datamart to screen women for suicidal behavior. Among women without any diagnostic codes related to suicidal behavior (n = 273,410), 5880 women were randomly sampled, of whom 1120 had at least one mention of terms related to suicidal behavior in clinical notes. NLP was then used to process clinical notes for the 1120 women. Chart reviews were performed for subsamples of women.ResultsUsing diagnostic codes, 196 pregnant women were screened positive for suicidal behavior, among whom 149 (76%) had confirmed suicidal behavior by chart review. Using NLP among those without diagnostic codes, 486 pregnant women were screened positive for suicidal behavior, among whom 146 (30%) had confirmed suicidal behavior by chart review.ConclusionsThe use of NLP substantially improves the sensitivity of screening suicidal behavior in EMRs. However, the prevalence of confirmed suicidal behavior was lower among women who did not have diagnostic codes for suicidal behavior but screened positive by NLP. NLP should be used together with diagnostic codes for future EMR-based phenotyping studies for suicidal behavior.
Highlights
We examined the comparative performance of structured, diagnostic codes vs. natural language processing (NLP) of unstructured text for screening suicidal behavior among pregnant women in electronic medical records (EMRs)
We initially identified women aged 10–64 years with at least one diagnostic code related to pregnancy or delivery (International Classification of Diseases-10 [ICD-10]: Z3A.*, O0.*- O9.*; ICD-9: 640.*- 679.*, V22.*, V23.*, V24.*, V27.*, V28.*; Diagnosis-Related Group [Diagnosis-related group (DRG)]: 370–384) in the EMRs from January 1, 1996 to March 31, 2016, totaling 275,843 women included in the datamart (Fig. 1)
Chart review to obtain estimates for prevalence of confirmed suicidal behavior After the screening process, one of the authors (QYZ) manually reviewed the clinical notes for random samples of (1) 50 women from the diagnostic codes group (N = 196); (2) 100 women from the NLP group (N = 486); (3) 100 women from the NLP not relevant group (N = 634); and (4) 100 women who had neither diagnostic codes nor term mentions related to suicidal behavior (N = 4162)
Summary
We examined the comparative performance of structured, diagnostic codes vs. natural language processing (NLP) of unstructured text for screening suicidal behavior among pregnant women in electronic medical records (EMRs). The reported low sensitivity of billing codes for identifying suicidal behavior implies that a sizable portion of suicidal cases may be missed when case-finding relies on ICD codes alone. The increasing utilization of electronic medical records (EMRs) has provided unprecedented opportunities for identifying pregnant women with suicidal behavior. The automated examination of a large volume of clinical notes requires the use of natural language processing (NLP) [23], a field of computational linguistics that allows computers to extract relevant information from unstructured human language [22]. Very few studies have used NLP to identify suicidal behavior in EMRs [10, 33, 34], and no study has reported any classification algorithm that is highly predictive of suicidal behavior
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have