Supporting systematic reviews using LDA-based document representations.

Yuanhan Mo,Georgios Kontonatsios,Sophia Ananiadou

doi:10.1186/s13643-015-0117-0

Yuanhan Mo, Georgios Kontonatsios + Show 1 more

Open Access

https://doi.org/10.1186/s13643-015-0117-0

Copy DOI

Journal: Systematic Reviews	Publication Date: Nov 26, 2015
Citations: 93	License type: cc-by

Affiliation: University of Manchester

Abstract

BackgroundIdentifying relevant studies for inclusion in a systematic review (i.e. screening) is a complex, laborious and expensive task. Recently, a number of studies has shown that the use of machine learning and text mining methods to automatically identify relevant studies has the potential to drastically decrease the workload involved in the screening phase. The vast majority of these machine learning methods exploit the same underlying principle, i.e. a study is modelled as a bag-of-words (BOW).MethodsWe explore the use of topic modelling methods to derive a more informative representation of studies. We apply Latent Dirichlet allocation (LDA), an unsupervised topic modelling approach, to automatically identify topics in a collection of studies. We then represent each study as a distribution of LDA topics. Additionally, we enrich topics derived using LDA with multi-word terms identified by using an automatic term recognition (ATR) tool. For evaluation purposes, we carry out automatic identification of relevant studies using support vector machine (SVM)-based classifiers that employ both our novel topic-based representation and the BOW representation.ResultsOur results show that the SVM classifier is able to identify a greater number of relevant studies when using the LDA representation than the BOW representation. These observations hold for two systematic reviews of the clinical domain and three reviews of the social science domain.ConclusionsA topic-based feature representation of documents outperforms the BOW representation when applied to the task of automatic citation screening. The proposed term-enriched topics are more informative and less ambiguous to systematic reviewers.Electronic supplementary materialThe online version of this article (doi:10.1186/s13643-015-0117-0) contains supplementary material, which is available to authorized users.

Highlights

IntroductionIdentifying relevant studies for inclusion in a systematic review (i.e. screening) is a complex, laborious and expensive task
Identifying relevant studies for inclusion in a systematic review is a complex, laborious and expensive task
The datasets were used as the basis for the intrinsic evaluation of the different text classification methods

Summary

Introduction

Identifying relevant studies for inclusion in a systematic review (i.e. screening) is a complex, laborious and expensive task. The screening phase of systematic reviews aims to identify citations relevant to a research topic, according to a certain pre-defined protocol [1,2,3,4] known as the Population, the Intervention, the Comparator and the Outcome (PICO) framework. The number of relevant citations is usually significantly lower than the number of the irrelevant, which means that reviewers have to deal with an extremely imbalanced datasets To overcome these limitations, methods such as machine learning, text mining [9, 10], text classification [11] and active learning [6, 12] have been used to partially automate this process, in order to reduce the workload, without sacrificing the quality of the reviews.

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Supporting systematic reviews using LDA-based document representations.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Systematic Reviews

Lead the way for us

Similar Papers

Impact of findings from grey literature on the outcomes of systematic reviews on interventions to prevent obesity among children: a systematic review
Jessica Tyndall ... Tracy Merlin
JBI Library of Systematic Reviews | VOL. 10
Jessica Tyndall, et. al.Jessica Tyndall ... Tracy Merlin
01 Jan 2012
JBI Library of Systematic Reviews | VOL. 10

“One more time”: why replicating some syntheses of evidence relevant to COVID-19 makes sense
Matthew J Page ... Peter Tugwell
Journal of Clinical Epidemiology | VOL. 125
Matthew J Page, et. al.Matthew J Page ... Peter Tugwell
25 May 2020
Journal of Clinical Epidemiology | VOL. 125

Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The PRISMA Statement
David Moher ... Douglas G Altman
Journal of Clinical Epidemiology | VOL. 62
David Moher, et. al.David Moher ... Douglas G Altman
23 Jul 2009
Journal of Clinical Epidemiology | VOL. 62

PROTOCOL: Impact of financial inclusion in low- and middle-income countries: a systematic review of reviews.
Maren Duvendack ... Philip Mader
Campbell systematic reviews | VOL. 14
Maren Duvendack, et. al.Maren Duvendack ... Philip Mader
01 Jan 2018
Campbell systematic reviews | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Supporting systematic reviews using LDA-based document representations.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Systematic Reviews