Classifying social determinants of health from unstructured electronic health records using deep learning-based natural language processing

Sifei Han,Robert F Zhang,Lingyun Shi,Russell Richie,Haixia Liu,Andrew Tseng,Wei Quan,Neal Ryan,David Brent,Fuchiang R Tsui

doi:10.1016/j.jbi.2021.103984

Abstract

ObjectiveSocial determinants of health (SDOH) are non-medical factors that can profoundly impact patient health outcomes. However, SDOH are rarely available in structured electronic health record (EHR) data such as diagnosis codes, and more commonly found in unstructured narrative clinical notes. Hence, identifying social context from unstructured EHR data has become increasingly important. Yet, previous work on using natural language processing to automate extraction of SDOH from text (a) usually focuses on an ad hoc selection of SDOH, and (b) does not use the latest advances in deep learning. Our objective was to advance automatic extraction of SDOH from clinical text by (a) systematically creating a set of SDOH based on standard biomedical and psychiatric ontologies, and (b) training state-of-the-art deep neural networks to extract mentions of these SDOH from clinical notes. DesignA retrospective cohort study. Setting and participantsData were extracted from the Medical Information Mart for Intensive Care (MIMIC-III) database. The corpus comprised 3,504 social related sentences from 2,670 clinical notes. MethodsWe developed a framework for automated classification of multiple SDOH categories. Our dataset comprised narrative clinical notes under the “Social Work” category in the MIMIC-III Clinical Database. Using standard terminologies, SNOMED-CT and DSM-IV, we systematically curated a set of 13 SDOH categories and created annotation guidelines for these. After manually annotating the 3,504 sentences, we developed and tested three deep neural network (DNN) architectures – convolutional neural network (CNN), long short-term memory (LSTM) network, and the Bidirectional Encoder Representations from Transformers (BERT) – for automated detection of eight SDOH categories. We also compared these DNNs to three baselines models: (1) cTAKES, as well as (2) L2-regularized logistic regression and (3) random forests on bags-of-words. Model evaluation metrics included micro- and macro- F1, and area under the receiver operating characteristic curve (AUC). ResultsAll three DNN models accurately classified all SDOH categories (minimum micro-F1 = 0.632, minimum macro-AUC = 0.854). Compared to the CNN and LSTM, BERT performed best in most key metrics (micro-F1 = 0.690, macro-AUC = 0.907). The BERT model most effectively identified the “occupational” category (F1 = 0.774, AUC = 0.965) and least effectively identified the “non-SDOH” category (F = 0.491, AUC = 0.788). BERT outperformed cTAKES in distinguishing social vs non-social sentences (BERT F1 = 0.87 vs. cTAKES F1 = 0.06), and outperformed logistic regression (micro-F1 = 0.649, macro-AUC = 0.696) and random forest (micro-F1 = 0.502, macro-AUC = 0.523) trained on bag-of-words. ConclusionsOur study framework with DNN models demonstrated improved performance for efficiently identifying a systematic range of SDOH categories from clinical notes in the EHR. Improved identification of patient SDOH may further improve healthcare outcomes.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Biomedical Informatics	Publication Date: Jan 7, 2022
Citations: 58	License type: publisher-specific-oa

R Discovery Prime

R Discovery Prime

Classifying social determinants of health from unstructured electronic health records using deep learning-based natural language processing

Abstract

Talk to us

Similar Papers

More From: Journal of Biomedical Informatics

Lead the way for us

Similar Papers

Engineering Document Summarization Using Sentence Representations Generated by Bidirectional Language Model
Yan Jin ... Yunjian Qiu
-
Yan Jin, et. al.Yan Jin ... Yunjian Qiu
17 Aug 2021
17 Aug 2021

Identification of asthma control factor in clinical notes using a hybrid deep learning model
Bhavani Singh Agnikula Kshatriya ... Chung-Il Wi
BMC Medical Informatics and Decision Making | VOL. 21
Bhavani Singh Agnikula Kshatriya, et. al.Bhavani Singh Agnikula Kshatriya ... Chung-Il Wi
01 Nov 2021
BMC Medical Informatics and Decision Making | VOL. 21

Classification Performance Comparison of BERT and IndoBERT on SelfReport of COVID-19 Status on Social Media
Irwan Budiman ... Mohammad Reza Faisal
Journal of Computer Sciences Institute | VOL. 30
Irwan Budiman, et. al.Irwan Budiman ... Mohammad Reza Faisal
20 Mar 2024
Journal of Computer Sciences Institute | VOL. 30

Oversampling effect in pretraining for bidirectional encoder representations from transformers (BERT) to localize medical BERT and enhance biomedical BERT
Shoya Wada ... Yasushi Matsumura
Artificial Intelligence In Medicine | VOL. 153
Shoya Wada, et. al.Shoya Wada ... Yasushi Matsumura
05 May 2024
Artificial Intelligence In Medicine | VOL. 153

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Classifying social determinants of health from unstructured electronic health records using deep learning-based natural language processing

Abstract

Talk to us

Similar Papers

More From: Journal of Biomedical Informatics