Automatic Classification of Thyroid Findings Using Static and Contextualized Ensemble Natural Language Processing Systems: Development Study.

Dongyup Shin,Hye Jin Kam,Min-Seok Jeon,Ha Young Kim

doi:10.2196/30223

Abstract

BackgroundIn the case of Korean institutions and enterprises that collect nonstandardized and nonunified formats of electronic medical examination results from multiple medical institutions, a group of experienced nurses who can understand the results and related contexts initially classified the reports manually. The classification guidelines were established by years of workers’ clinical experiences and there were attempts to automate the classification work. However, there have been problems in which rule-based algorithms or human labor–intensive efforts can be time-consuming or limited owing to high potential errors. We investigated natural language processing (NLP) architectures and proposed ensemble models to create automated classifiers.ObjectiveThis study aimed to develop practical deep learning models with electronic medical records from 284 health care institutions and open-source corpus data sets for automatically classifying 3 thyroid conditions: healthy, caution required, and critical. The primary goal is to increase the overall accuracy of the classification, yet there are practical and industrial needs to correctly predict healthy (negative) thyroid condition data, which are mostly medical examination results, and minimize false-negative rates under the prediction of healthy thyroid conditions.MethodsThe data sets included thyroid and comprehensive medical examination reports. The textual data are not only documented in fully complete sentences but also written in lists of words or phrases. Therefore, we propose static and contextualized ensemble NLP network (SCENT) systems to successfully reflect static and contextual information and handle incomplete sentences. We prepared each convolution neural network (CNN)-, long short-term memory (LSTM)-, and efficiently learning an encoder that classifies token replacements accurately (ELECTRA)-based ensemble model by training or fine-tuning them multiple times. Through comprehensive experiments, we propose 2 versions of ensemble models, SCENT-v1 and SCENT-v2, with the single-architecture–based CNN, LSTM, and ELECTRA ensemble models for the best classification performance and practical use, respectively. SCENT-v1 is an ensemble of CNN and ELECTRA ensemble models, and SCENT-v2 is a hierarchical ensemble of CNN, LSTM, and ELECTRA ensemble models. SCENT-v2 first classifies the 3 labels using an ELECTRA ensemble model and then reclassifies them using an ensemble model of CNN and LSTM if the ELECTRA ensemble model predicted them as “healthy” labels.ResultsSCENT-v1 outperformed all the suggested models, with the highest F1 score (92.56%). SCENT-v2 had the second-highest recall value (94.44%) and the fewest misclassifications for caution-required thyroid condition while maintaining 0 classification error for the critical thyroid condition under the prediction of the healthy thyroid condition.ConclusionsThe proposed SCENT demonstrates good classification performance despite the unique characteristics of the Korean language and problems of data lack and imbalance, especially for the extremely low amount of critical condition data. The result of SCENT-v1 indicates that different perspectives of static and contextual input token representations can enhance classification performance. SCENT-v2 has a strong impact on the prediction of healthy thyroid conditions.

Highlights

In South Korea, a large portion of medical services are maintained and operated under the public health insurance system [1,2,3,4], and the Korean National Health Insurance Corporation conducts biannual national health screening examinations
static and contextualized ensemble NLP network (SCENT)-v1 is an ensemble of convolution neural network (CNN) and encoder that classifies token replacements accurately (ELECTRA) ensemble models, and SCENT-v2 is a hierarchical ensemble of CNN, long short-term memory (LSTM), and ELECTRA ensemble models
The result of SCENT-v1 indicates that different perspectives of static and contextual input token representations can enhance classification performance

Summary

Introduction

In South Korea, a large portion of medical services are maintained and operated under the public health insurance system [1,2,3,4], and the Korean National Health Insurance Corporation conducts biannual national health screening examinations. The entrusted companies conduct the examination in partnership with affiliated examination centers in large hospitals or professional examination centers and collect the results from individual medical institutions to provide follow-up health care services to the clients. To ensure that consistent services are offered, a group of experienced nurses in examination work has been established using classification guidelines based on important keywords and by manually classifying individual test results to organize these results into a single unified format. In the case of Korean institutions and enterprises that collect nonstandardized and nonunified formats of electronic medical examination results from multiple medical institutions, a group of experienced nurses who can understand the results and related contexts initially classified the reports manually. We investigated natural language processing (NLP) architectures and proposed ensemble models to create automated classifiers

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: JMIR Medical Informatics	Publication Date: Sep 21, 2021
Citations: 8	License type: cc-by

R Discovery Prime

R Discovery Prime

Automatic Classification of Thyroid Findings Using Static and Contextualized Ensemble Natural Language Processing Systems: Development Study.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: JMIR Medical Informatics

Lead the way for us

Similar Papers

Evaluation of Cryptocurrency Price Prediction Using LSTM and CNNs Models
Ng Shi Wen ... Lew Sook Ling
JOIV : International Journal on Informatics Visualization | VOL. 7
Ng Shi Wen, et. al.Ng Shi Wen ... Lew Sook Ling
30 Nov 2023
JOIV : International Journal on Informatics Visualization | VOL. 7

Comparison of Long Short-Term Memory and Convolutional Neural Network Models for Emergency Department Patients’ Arrival Daily Forecasting
Sina Moosavi Kashani ... Sanaz Zargar Balaye Jame
Journal of Archives in Military Medicine | VOL. 12
Sina Moosavi Kashani, et. al.Sina Moosavi Kashani ... Sanaz Zargar Balaye Jame
02 Mar 2024
Journal of Archives in Military Medicine | VOL. 12

A Novel Hybrid Deep Neural Network to Predict Pre-impact Fall for Older People Based on Wearable Inertial Sensors.
Xiaoqun Yu ... Shuping Xiong
Frontiers in Bioengineering and Biotechnology | VOL. 8
Xiaoqun Yu, et. al.Xiaoqun Yu ... Shuping Xiong
12 Feb 2020
Frontiers in Bioengineering and Biotechnology | VOL. 8

Classification of electrocardiogram signal using an ensemble of deep learning models
Saroj Kumar Pandey ... Rekh Ram Janghel
Data Technologies and Applications | VOL. 55
Saroj Kumar Pandey, et. al.Saroj Kumar Pandey ... Rekh Ram Janghel
16 Feb 2021
Data Technologies and Applications | VOL. 55

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automatic Classification of Thyroid Findings Using Static and Contextualized Ensemble Natural Language Processing Systems: Development Study.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: JMIR Medical Informatics