Ensemble of deep learning language models to support the creation of living systematic reviews for the COVID-19 literature

Julien Knafou,Nikolay Borissov,Poorya Amini,Aziz Mert Ipekci,Quentin Haas,Michel Counotte,Leonie Heron,Douglas Teodoro,Diana Buitrago-Garcia,Nicola Low,Hira Imeri

doi:10.1186/s13643-023-02247-9

Abstract

BackgroundThe COVID-19 pandemic has led to an unprecedented amount of scientific publications, growing at a pace never seen before. Multiple living systematic reviews have been developed to assist professionals with up-to-date and trustworthy health information, but it is increasingly challenging for systematic reviewers to keep up with the evidence in electronic databases. We aimed to investigate deep learning-based machine learning algorithms to classify COVID-19-related publications to help scale up the epidemiological curation process.MethodsIn this retrospective study, five different pre-trained deep learning-based language models were fine-tuned on a dataset of 6365 publications manually classified into two classes, three subclasses, and 22 sub-subclasses relevant for epidemiological triage purposes. In a k-fold cross-validation setting, each standalone model was assessed on a classification task and compared against an ensemble, which takes the standalone model predictions as input and uses different strategies to infer the optimal article class. A ranking task was also considered, in which the model outputs a ranked list of sub-subclasses associated with the article.ResultsThe ensemble model significantly outperformed the standalone classifiers, achieving a F1-score of 89.2 at the class level of the classification task. The difference between the standalone and ensemble models increases at the sub-subclass level, where the ensemble reaches a micro F1-score of 70% against 67% for the best-performing standalone model. For the ranking task, the ensemble obtained the highest recall@3, with a performance of 89%. Using an unanimity voting rule, the ensemble can provide predictions with higher confidence on a subset of the data, achieving detection of original papers with a F1-score up to 97% on a subset of 80% of the collection instead of 93% on the whole dataset.ConclusionThis study shows the potential of using deep learning language models to perform triage of COVID-19 references efficiently and support epidemiological curation and review. The ensemble consistently and significantly outperforms any standalone model. Fine-tuning the voting strategy thresholds is an interesting alternative to annotate a subset with higher predictive confidence.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Systematic Reviews	Publication Date: Jun 5, 2023
Citations: 2	License type: open-access

R Discovery Prime

R Discovery Prime

Ensemble of deep learning language models to support the creation of living systematic reviews for the COVID-19 literature

Abstract

Talk to us

Similar Papers

More From: Systematic Reviews

Lead the way for us

Similar Papers

Performance Evaluation of Homogeneous and Heterogeneous Ensemble Models for Groundwater Salinity Predictions: a Regional-Scale Comparison Study
Alvin Lal ... Bithin Datta
Water, Air, & Soil Pollution | VOL. 231
Alvin Lal, et. al.Alvin Lal ... Bithin Datta
01 Jun 2020
Water, Air, & Soil Pollution | VOL. 231

Sense representations for Portuguese: experiments with sense embeddings and deep neural language models
Jéssica Rodrigues Da Silva ... Helena De M Caseli
Language Resources and Evaluation | VOL. 55
Jéssica Rodrigues Da Silva, et. al.Jéssica Rodrigues Da Silva ... Helena De M Caseli
28 Feb 2021
Language Resources and Evaluation | VOL. 55

Dimensionality and Ramping: Signatures of Sentence Integration in the Dynamics of Brains and Deep Language Models.
Théo Desbordes ... Maxime Oquab
The Journal of Neuroscience | VOL. 43
Théo Desbordes, et. al.Théo Desbordes ... Maxime Oquab
22 May 2023
The Journal of Neuroscience | VOL. 43

Transformer-based deep neural network language models for Alzheimer\u2019s disease risk assessment from targeted speech
Alireza Roshanzamir ... Hamid Aghajan
BMC Medical Informatics and Decision Making | VOL. 21
Alireza Roshanzamir, et. al.Alireza Roshanzamir ... Hamid Aghajan
09 Mar 2021
BMC Medical Informatics and Decision Making | VOL. 21

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Ensemble of deep learning language models to support the creation of living systematic reviews for the COVID-19 literature

Abstract

Talk to us

Similar Papers

More From: Systematic Reviews