Toward assessing clinical trial publications for reporting transparency

Halil Kilicoglu,Graciela Rosemblat,Linh Hoang,Sahil Wadhwa,Zeshan Peng,Mario Malički,Jodi Schneider,Gerben Ter Riet

doi:10.1016/j.jbi.2021.103717

Abstract

ObjectiveTo annotate a corpus of randomized controlled trial (RCT) publications with the checklist items of CONSORT reporting guidelines and using the corpus to develop text mining methods for RCT appraisal. MethodsWe annotated a corpus of 50 RCT articles at the sentence level using 37 fine-grained CONSORT checklist items. A subset (31 articles) was double-annotated and adjudicated, while 19 were annotated by a single annotator and reconciled by another. We calculated inter-annotator agreement at the article and section level using MASI (Measuring Agreement on Set-Valued Items) and at the CONSORT item level using Krippendorff’s α. We experimented with two rule-based methods (phrase-based and section header-based) and two supervised learning approaches (support vector machine and BioBERT-based neural network classifiers), for recognizing 17 methodology-related items in the RCT Methods sections. ResultsWe created CONSORT-TM consisting of 10,709 sentences, 4,845 (45%) of which were annotated with 5,246 labels. A median of 28 CONSORT items (out of possible 37) were annotated per article. Agreement was moderate at the article and section levels (average MASI: 0.60 and 0.64, respectively). Agreement varied considerably among individual checklist items (Krippendorff’s α= 0.06–0.96). The model based on BioBERT performed best overall for recognizing methodology-related items (micro-precision: 0.82, micro-recall: 0.63, micro-F1: 0.71). Combining models using majority vote and label aggregation further improved precision and recall, respectively. ConclusionOur annotated corpus, CONSORT-TM, contains more fine-grained information than earlier RCT corpora. Low frequency of some CONSORT items made it difficult to train effective text mining models to recognize them. For the items commonly reported, CONSORT-TM can serve as a testbed for text mining methods that assess RCT transparency, rigor, and reliability, and support methods for peer review and authoring assistance. Minor modifications to the annotation scheme and a larger corpus could facilitate improved text mining models. CONSORT-TM is publicly available at https://github.com/kilicogluh/CONSORT-TM.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of biomedical informatics	Publication Date: Feb 26, 2021
Citations: 15	License type: publisher-specific-oa

R Discovery Prime

R Discovery Prime

Toward assessing clinical trial publications for reporting transparency

Abstract

Talk to us

Similar Papers

More From: Journal of biomedical informatics

Lead the way for us

Similar Papers

From words to pixels: text and image mining methods for service research
Francisco Villarroel Ordenes ... Shunyuan Zhang
Journal of Service Management | VOL. 30
Francisco Villarroel Ordenes, et. al.Francisco Villarroel Ordenes ... Shunyuan Zhang
09 Oct 2019
Journal of Service Management | VOL. 30

Has the reporting quality of published randomised controlled trial protocols improved since the SPIRIT statement? A methodological study
Zet Wei Tan ... Justine M Naylor
BMJ Open | VOL. 10
Zet Wei Tan, et. al.Zet Wei Tan ... Justine M Naylor
01 Aug 2020
BMJ Open | VOL. 10

Comparing text mining and manual coding methods: Analysing interview data on quality of care in long-term care for older adults.
Coen Hacking ... Sil Aarts
PLOS ONE | VOL. 18
Coen Hacking, et. al.Coen Hacking ... Sil Aarts
08 Nov 2023
PLOS ONE | VOL. 18

Text mining and probabilistic language modeling for online review spam detection
Raymond Y K Lau ... Yunqing Xia
ACM Transactions on Management Information Systems | VOL. 2
Raymond Y K Lau, et. al.Raymond Y K Lau ... Yunqing Xia
01 Dec 2011
ACM Transactions on Management Information Systems | VOL. 2

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Toward assessing clinical trial publications for reporting transparency

Abstract

Talk to us

Similar Papers

More From: Journal of biomedical informatics