Automated classification of clinical trial eligibility criteria text based on ensemble learning and metric learning

Kun Zeng,Yibin Xu,Likeng Liang,Tianyong Hao,Ge Lin

doi:10.1186/s12911-021-01492-z

Abstract

BackgroundEligibility criteria are the primary strategy for screening the target participants of a clinical trial. Automated classification of clinical trial eligibility criteria text by using machine learning methods improves recruitment efficiency to reduce the cost of clinical research. However, existing methods suffer from poor classification performance due to the complexity and imbalance of eligibility criteria text data.MethodsAn ensemble learning-based model with metric learning is proposed for eligibility criteria classification. The model integrates a set of pre-trained models including Bidirectional Encoder Representations from Transformers (BERT), A Robustly Optimized BERT Pretraining Approach (RoBERTa), XLNet, Pre-training Text Encoders as Discriminators Rather Than Generators (ELECTRA), and Enhanced Representation through Knowledge Integration (ERNIE). Focal Loss is used as a loss function to address the data imbalance problem. Metric learning is employed to train the embedding of each base model for feature distinguish. Soft Voting is applied to achieve final classification of the ensemble model. The dataset is from the standard evaluation task 3 of 5th China Health Information Processing Conference containing 38,341 eligibility criteria text in 44 categories.ResultsOur ensemble method had an accuracy of 0.8497, a precision of 0.8229, and a recall of 0.8216 on the dataset. The macro F1-score was 0.8169, outperforming state-of-the-art baseline methods by 0.84% improvement on average. In addition, the performance improvement had a p-value of 2.152e-07 with a standard t-test, indicating that our model achieved a significant improvement.ConclusionsA model for classifying eligibility criteria text of clinical trials based on multi-model ensemble learning and metric learning was proposed. The experiments demonstrated that the classification performance was improved by our ensemble model significantly. In addition, metric learning was able to improve word embedding representation and the focal loss reduced the impact of data imbalance to model performance.

Highlights

Eligibility criteria are the primary strategy for screening the target participants of a clinical trial
A model for classifying eligibility criteria text of clinical trials based on multi-model ensemble learning and metric learning was proposed
The experiments demonstrated that the classification performance was improved by our ensemble model significantly

Summary

Introduction

Eligibility criteria are the primary strategy for screening the target participants of a clinical trial. Automated classification of clinical trial eligibility criteria text by using machine learning methods improves recruitment efficiency to reduce the cost of clinical research. Existing methods suffer from poor classification performance due to the complexity and imbalance of eligibility criteria text data. Recruitment of clinical trial subjects is generally processed by manually comparing medical records with eligibility criteria [2], which is time-consuming and cost-sensitive [3]. Eligibility criteria text is usually short and informally represented with a feature-sparse issue. The construction of an automatic method using natural language processing (NLP) techniques to effectively classify clinical trial eligibility criteria text is still a challengeable research [5, 6]

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Medical Informatics and Decision Making	Publication Date: Jul 1, 2021
Citations: 9	License type: open-access

R Discovery Prime

R Discovery Prime

Automated classification of clinical trial eligibility criteria text based on ensemble learning and metric learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Medical Informatics and Decision Making

Lead the way for us

Similar Papers

An Ensemble Learning Strategy for Eligibility Criteria Text Classification for Clinical Trial Recruitment: Algorithm Development and Validation.
Kun Zeng ... Yingying Qu
JMIR Medical Informatics | VOL. 8
Kun Zeng, et. al.Kun Zeng ... Yingying Qu
01 Jul 2020
JMIR Medical Informatics | VOL. 8

Bert model fine-tuning for text classification in knee OA radiology reports
L Chen ... V Pedoia
Osteoarthritis and Cartilage | VOL. 28
L Chen, et. al.L Chen ... V Pedoia
01 Apr 2020
Osteoarthritis and Cartilage | VOL. 28

Does BERT need domain adaptation for clinical negation detection?
Chen Lin ... Farig Sadeque
Journal of the American Medical Informatics Association | VOL. 27
Chen Lin, et. al.Chen Lin ... Farig Sadeque
11 Feb 2020
Journal of the American Medical Informatics Association | VOL. 27

Relation Classification for Bleeding Events From Electronic Health Records Using Deep Learning Systems: An Empirical Study
Avijit Mitra ... Bhanu Pratap Singh Rawat
JMIR Medical Informatics | VOL. 9
Avijit Mitra, et. al.Avijit Mitra ... Bhanu Pratap Singh Rawat
02 Jul 2021
JMIR Medical Informatics | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automated classification of clinical trial eligibility criteria text based on ensemble learning and metric learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Medical Informatics and Decision Making