Abstract
We sought to apply natural language processing to the task of automatic risk of bias assessment in preclinical literature, which could speed the process of systematic review, provide information to guide research improvement activity, and support translation from preclinical to clinical research. We use 7840 full‐text publications describing animal experiments with yes/no annotations for five risk of bias items. We implement a series of models including baselines (support vector machine, logistic regression, random forest), neural models (convolutional neural network, recurrent neural network with attention, hierarchical neural network) and models using BERT with two strategies (document chunk pooling and sentence extraction). We tune hyperparameters to obtain the highest F1 scores for each risk of bias item on the validation set and compare evaluation results on the test set to our previous regular expression approach. The F1 scores of best models on test set are 82.0% for random allocation, 81.6% for blinded assessment of outcome, 82.6% for conflict of interests, 91.4% for compliance with animal welfare regulations and 46.6% for reporting animals excluded from analysis. Our models significantly outperform regular expressions for four risk of bias items. For random allocation, blinded assessment of outcome, conflict of interests and animal exclusions, neural models achieve good performance; for animal welfare regulations, BERT model with a sentence extraction strategy works better. Convolutional neural networks are the overall best models. The tool is publicly available which may contribute to the future monitoring of risk of bias reporting for research improvement activities.
Highlights
Systematic review is a type of literature review that attempts to collate all empirical evidence relevant to a pre-specified research question
We explore three neural models: convolutional neural networks (CNNs), a powerful model for text classification;[9] RNN which is good at modelling sequential text data;[31] and hierarchical attention network (HNN)[32] which takes the hierarchical structure among word, sentence and document into consideration
We define ‘True Positive’ as the number of records which report the risk of bias item and are predicted as reported; ‘True Negative’ as the number of records which do not report the risk of bias item and are predicted as unreported; ‘False Positive’ as the number of records which do not report the risk of bias item but are predicted as reported; and ‘False Negative’ as the number of records which report the risk of bias item but are predicted as unreported
Summary
Systematic review is a type of literature review that attempts to collate all empirical evidence relevant to a pre-specified research question. It uses explicit and systematic methods to minimise bias and provide more reliable findings than narrative review.[1] After the collection of research publications which meet pre-specified inclusion criteria, a critical step is the reporting of strategies designed to reduce risks of bias in the included publications, which is central to the assessment of the reliability of the research findings.[2] The current procedure for risk of bias assessment in literature is that it usually performed separately by two independent investigators, working with an adjudicator to resolve any disagreements. Such tools would have been useful in evaluating the impact of measures designed to improve the quality and completeness of research reporting, for instance the NPG Quality in Publication (NPQIP) study,[3] the Intervention to Improve Compliance with the ARRIVE guidelines (IICARus) studies,[4] in future evaluation of reporting standards such as the Materials-Design-Analysis-Reporting Minimum Standards Framework[5] and in measuring the impact of institutional research improvement activities.[6]
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have