A Natural Language Processing Model for COVID-19 Detection Based on Dutch General Practice Electronic Health Records by Using Bidirectional Encoder Representations From Transformers: Development and Validation Study.

Maarten Homburg,Tim Olde Hartman,Thijmen Kupers,Marjolein Berger,Jeroen Kuiper,Premysl Velek,Lilian Peters,Eline Meijer,Matthijs Berends,Evelien De Schepper,Jean Muris

doi:10.2196/49944

Abstract

Natural language processing (NLP) models such as bidirectional encoder representations from transformers (BERT) hold promise in revolutionizing disease identification from electronic health records (EHRs) by potentially enhancing efficiency and accuracy. However, their practical application in practice settings demands a comprehensive and multidisciplinary approach to development and validation. The COVID-19 pandemic highlighted challenges in disease identification due to limited testing availability and challenges in handling unstructured data. In the Netherlands, where general practitioners (GPs) serve as the first point of contact for health care, EHRs generated by these primary care providers contain a wealth of potentially valuable information. Nonetheless, the unstructured nature of free-text entries in EHRs poses challenges in identifying trends, detecting disease outbreaks, or accurately pinpointing COVID-19 cases. This study aims to develop and validate a BERT model for detecting COVID-19 consultations in general practice EHRs in the Netherlands. The BERT model was initially pretrained on Dutch language data and fine-tuned using a comprehensive EHR data set comprising confirmed COVID-19 GP consultations and non-COVID-19-related consultations. The data set was partitioned into a training and development set, and the model's performance was evaluated on an independent test set that served as the primary measure of its effectiveness in COVID-19 detection. To validate the final model, its performance was assessed through 3 approaches. First, external validation was applied on an EHR data set from a different geographic region in the Netherlands. Second, validation was conducted using results of polymerase chain reaction (PCR) test data obtained from municipal health services. Lastly, correlation between predicted outcomes and COVID-19-related hospitalizations in the Netherlands was assessed, encompassing the period around the outbreak of the pandemic in the Netherlands, that is, the period before widespread testing. The model development used 300,359 GP consultations. We developed a highly accurate model for COVID-19 consultations (accuracy 0.97, F1-score 0.90, precision 0.85, recall 0.85, specificity 0.99). External validations showed comparable high performance. Validation on PCR test data showed high recall but low precision and specificity. Validation using hospital data showed significant correlation between COVID-19 predictions of the model and COVID-19-related hospitalizations (F1-score 96.8; P<.001; R2=0.69). Most importantly, the model was able to predict COVID-19 cases weeks before the first confirmed case in the Netherlands. The developed BERT model was able to accurately identify COVID-19 cases among GP consultations even preceding confirmed cases. The validated efficacy of our BERT model highlights the potential of NLP models to identify disease outbreaks early, exemplifying the power of multidisciplinary efforts in harnessing technology for disease identification. Moreover, the implications of this study extend beyond COVID-19 and offer a blueprint for the early recognition of various illnesses, revealing that such models could revolutionize disease surveillance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Medical Internet Research	Publication Date: Oct 4, 2023
Citations: 1	License type: cc-by

R Discovery Prime

R Discovery Prime

A Natural Language Processing Model for COVID-19 Detection Based on Dutch General Practice Electronic Health Records by Using Bidirectional Encoder Representations From Transformers: Development and Validation Study.

Abstract

Talk to us

Similar Papers

More From: Journal of Medical Internet Research

Lead the way for us

Similar Papers

Engineering Document Summarization Using Sentence Representations Generated by Bidirectional Language Model
Yan Jin ... Yunjian Qiu
-
Yan Jin, et. al.Yan Jin ... Yunjian Qiu
17 Aug 2021
17 Aug 2021

Identification of asthma control factor in clinical notes using a hybrid deep learning model
Bhavani Singh Agnikula Kshatriya ... Chung-Il Wi
BMC Medical Informatics and Decision Making | VOL. 21
Bhavani Singh Agnikula Kshatriya, et. al.Bhavani Singh Agnikula Kshatriya ... Chung-Il Wi
01 Nov 2021
BMC Medical Informatics and Decision Making | VOL. 21

Developing Artificial Intelligence Models for Extracting Oncologic Outcomes from Japanese Electronic Health Records.
Kenji Araki ... Daisuke Satoh
Advances in Therapy | VOL. 40
Kenji Araki, et. al.Kenji Araki ... Daisuke Satoh
22 Dec 2022
Advances in Therapy | VOL. 40

Classifying the lifestyle status for Alzheimer’s disease from clinical notes using deep learning with weak supervision
Zitao Shen ... Rui Zhang
BMC medical informatics and decision making | VOL. 22
Zitao Shen, et. al.Zitao Shen ... Rui Zhang
01 Jul 2022
BMC medical informatics and decision making | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Natural Language Processing Model for COVID-19 Detection Based on Dutch General Practice Electronic Health Records by Using Bidirectional Encoder Representations From Transformers: Development and Validation Study.

Abstract

Talk to us

Similar Papers

More From: Journal of Medical Internet Research