Abstract
In an idealised vision of science the scientific literature is error-free. Errors reported during peer review are supposed to be corrected prior to publication, as further research establishes new knowledge based on the body of literature. It happens, however, that errors pass through peer review, and a minority of cases errata and retractions follow. Automated screening software can be applied to detect errors in manuscripts and publications. The contribution of this paper is twofold. First, we designed the erroneous reagent checking (ERC) benchmark to assess the accuracy of fact-checkers screening biomedical publications for dubious mentions of nucleotide sequence reagents. It comes with a test collection comprised of 1679 nucleotide sequence reagents that were curated by biomedical experts. Second, we benchmarked our own screening software called Seek&Blastn with three input formats to assess the extent of performance loss when operating on various publication formats. Our findings stress the superiority of markup formats (a 79% detection rate on XML and HTML) over the prominent PDF format (a 69% detection rate at most) regarding an error flagging task. This is the first published baseline on error detection involving reagents reported in biomedical scientific publications. The ERC benchmark is designed to facilitate the development and validation of software bricks to enhance the reliability of the peer review process.
Highlights
The original purpose of a scientific paper is to be read
Based on the benchmark that we develop in this paper, our main finding is that the leading file format for publications, which is PDF, is not the most appropriate format to perform both knowledge discovery and fact-checking
To measure the extent to which the input format impedes error detection, we designed the original benchmark called erroneous reagent checking (ERC) presented in the second section. This is a generic benchmark released as supplementary material allowing the assessment of any fact-checker, namely systems that aim to spot errors in biomedical papers
Summary
The original purpose of a scientific paper is to be read. Reading scientific papers is the main way that scientists acquire knowledge (Volentine and Tenopir 2013). To measure the extent to which the input format impedes error detection, we designed the original benchmark called ERC (erroneous reagent checking) presented in the second section This is a generic benchmark released as supplementary material (see “Appendix ERC benchmark ERC_H_v2 test collection”) allowing the assessment of any fact-checker, namely systems that aim to spot errors in biomedical papers. Fact-check_KO2 = q∕s measures the proportion of nucleotide sequences for which a wrong decision was made (e.g. Class0 instead of Class8) At this point, the metrics-defined can be used to answer the main question of this paper, namely what is the performance decay (if any) when providing inputs in PDF format compared to other, more structured, formats? Experts performed manual fact-checking, involving paper reading to identify nucleotide sequences, delineation of the purpose of each nucleotide sequence in the reported experiments, as well as the analysis of the BLASTN results they obtained.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have