A semi-supervised approach using label propagation to support citation screening

Georgios Kontonatsios,Austin J Brockmeier,Piotr Przybyła,John Mcnaught,Tingting Mu,John Y Goulermas,Sophia Ananiadou

doi:10.1016/j.jbi.2017.06.018

Georgios Kontonatsios, Austin J Brockmeier + Show 5 more

Open Access

https://doi.org/10.1016/j.jbi.2017.06.018

Copy DOI

Abstract

Citation screening, an integral process within systematic reviews that identifies citations relevant to the underlying research question, is a time-consuming and resource-intensive task. During the screening task, analysts manually assign a label to each citation, to designate whether a citation is eligible for inclusion in the review. Recently, several studies have explored the use of active learning in text classification to reduce the human workload involved in the screening task. However, existing approaches require a significant amount of manually labelled citations for the text classification to achieve a robust performance. In this paper, we propose a semi-supervised method that identifies relevant citations as early as possible in the screening process by exploiting the pairwise similarities between labelled and unlabelled citations to improve the classification performance without additional manual labelling effort. Our approach is based on the hypothesis that similar citations share the same label (e.g., if one citation should be included, then other similar citations should be included also). To calculate the similarity between labelled and unlabelled citations we investigate two different feature spaces, namely a bag-of-words and a spectral embedding based on the bag-of-words. The semi-supervised method propagates the classification codes of manually labelled citations to neighbouring unlabelled citations in the feature space. The automatically labelled citations are combined with the manually labelled citations to form an augmented training set. For evaluation purposes, we apply our method to reviews from clinical and public health. The results show that our semi-supervised method with label propagation achieves statistically significant improvements over two state-of-the-art active learning approaches across both clinical and public health reviews.

Highlights

Systematic reviews are used to identify relevant citations and answer research questions by gathering, filtering, and synthesising research evidence
The contributions that we make in this paper can be summarised in the following points: a) we propose a new semi-supervised active learning method to facilitate citation screening in clinical and public health reviews; b) we show that a low-dimensional spectral embedded feature space can more efficiently address the high terminological variation in public health reviews versus the bag-of-words representation; and c) experiments across two clinical and four public health reviews demonstrate that our method achieves significant improvements over two existing state-ofthe-art active learning methods when a limited number of labelled instances is available for training
The public health reviews were developed by the EPPI-Centre2 and reused by Miwa et al [8] to investigate the performance of both certainty and uncertainty-based active learners

Summary

Introduction

Systematic reviews are used to identify relevant citations and answer research questions by gathering, filtering, and synthesising research evidence. To identify and subsequently analyse every possible eligible study, Preprint submitted to Elsevier reviewers need to exhaustively filter out citations (retrieved by searches to literature databases) that do not fulfill the underlying eligibility criteria. An experienced reviewer requires 30 seconds on average to decide whether a single citation is eligible for inclusion in the review, this can extend to several minutes for complex topics [2]. This amounts to a considerable human workload, given that a typical screening task involves manually screening thousands of citations [3, 4, 5]

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Biomedical Informatics	Publication Date: Jun 23, 2017
Citations: 34	License type: cc-by

R Discovery Prime

R Discovery Prime

A semi-supervised approach using label propagation to support citation screening

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Biomedical Informatics

Lead the way for us

Similar Papers

Characteristics of departments with high-use of active learning in introductory STEM courses: implications for departmental transformation
Alexandra C Lau ... Estrella Johnson
International journal of STEM education | VOL. 11
Alexandra C Lau, et. al.Alexandra C Lau ... Estrella Johnson
12 Feb 2024
International journal of STEM education | VOL. 11

A Deep Active Learning Approach to the Automatic Classification of Volcano-Seismic Events
Grace F Manley ... David M Pyle
Frontiers in earth science | VOL. 10
Grace F Manley, et. al.Grace F Manley ... David M Pyle
15 Feb 2022
Frontiers in earth science | VOL. 10

Measuring the Effectiveness of Faculty Feedback on the use of an active integrated instructional pedagogy for the embryology course
Mohamed A Eladl ... Salman Y Guraya
Journal of Taibah University Medical Sciences | VOL. 17
Mohamed A Eladl, et. al.Mohamed A Eladl ... Salman Y Guraya
29 Sep 2021
Journal of Taibah University Medical Sciences | VOL. 17

Uncertainty-Based Selective Clustering for Active Learning
Sekjin Hwang ... Joonsoo Choi
IEEE access : practical innovations, open solutions | VOL. 10
Sekjin Hwang, et. al.Sekjin Hwang ... Joonsoo Choi
01 Jan 2021
IEEE access : practical innovations, open solutions | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A semi-supervised approach using label propagation to support citation screening

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Biomedical Informatics