A Survey of Bioinformatics Database and Software Usage through Mining the Literature.

Geraint Duck,Goran Nenadic,Robert Stevens,David L Robertson,Michele Filannino,Andy Brass,Shoba Ranganathan

doi:10.1371/journal.pone.0157989

Abstract

Computer-based resources are central to much, if not most, biological and medical research. However, while there is an ever expanding choice of bioinformatics resources to use, described within the biomedical literature, little work to date has provided an evaluation of the full range of availability or levels of usage of database and software resources. Here we use text mining to process the PubMed Central full-text corpus, identifying mentions of databases or software within the scientific literature. We provide an audit of the resources contained within the biomedical literature, and a comparison of their relative usage, both over time and between the sub-disciplines of bioinformatics, biology and medicine. We find that trends in resource usage differs between these domains. The bioinformatics literature emphasises novel resource development, while database and software usage within biology and medicine is more stable and conservative. Many resources are only mentioned in the bioinformatics literature, with a relatively small number making it out into general biology, and fewer still into the medical literature. In addition, many resources are seeing a steady decline in their usage (e.g., BLAST, SWISS-PROT), though some are instead seeing rapid growth (e.g., the GO, R). We find a striking imbalance in resource usage with the top 5% of resource names (133 names) accounting for 47% of total usage, and over 70% of resources extracted being only mentioned once each. While these results highlight the dynamic and creative nature of bioinformatics research they raise questions about software reuse, choice and the sharing of bioinformatics practice. Is it acceptable that so many resources are apparently never reused? Finally, our work is a step towards automated extraction of scientific method from text. We make the dataset generated by our study available under the CC0 license here: http://dx.doi.org/10.6084/m9.figshare.1281371.

Highlights

Numerous database and software resources are published, used and mentioned within the medicine, biology and bioinformatics literature [1, 2]
DBcat [5] was incorporated into a regular special issue from Nucleic Acids Research (NAR) [6], which lists databases which have been published at some point within that journal, and the Bioinformatics Links Directory [7], which is an associated special issue from the same journal, but focuses instead on web-services
This reduces to 3,926,176 total mentions (1,279,111 document mentions) once we apply the minimum postprocessing filtering confidence threshold of 80%. We use these results from bioNerDS to perform the following analyses: 1. Compare resource usage between this full corpus, and between medicine, biology and bioinformatics sub-corpora

Summary

Introduction

Numerous database and software resources are published, used and mentioned within the medicine, biology and bioinformatics literature [1, 2]. Keeping up-to-date with bioinformatics resources is difficult, but a necessary part of modern data management and analysis within biology and medicine. An up-to-date knowledge of available databases and software would be a valuable resource [3]. Previous attempts have been made to maintain accurate lists of available bioinformatics resources, though most have not been sufficiently comprehensive due to the slow process of manual curation, or specialised requirements for resource inclusion. Some previous work has been relatively limited in scope, for example, either focused on only a specific domain (e.g., phylogenetics software [8]), or focused on specific journals (e.g., Genome Biology and BMC Bioinformatics [9])

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PloS one	Publication Date: Jun 22, 2016
Citations: 47	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Survey of Bioinformatics Database and Software Usage through Mining the Literature.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one

Lead the way for us

Similar Papers

Judging the Scientific and Medical Literature: Some Legal Implications of Changes to Biomedical Research and Publication
G Edmond
Oxford journal of legal studies | VOL. 28
G EdmondG Edmond
01 Sep 2008
Oxford journal of legal studies | VOL. 28

Looking Ahead to the Next 75 Years for The Journal of Pediatrics and Medical Publishing
Glen Campbell
The Journal of Pediatrics | VOL. 155
Glen CampbellGlen Campbell
24 Jun 2009
The Journal of Pediatrics | VOL. 155

Assessment of the value of medical research information in the literatures
...
Chinese Journal of Medical Science Research Management | VOL. 27
, et. al. ...
21 Dec 2014
Chinese Journal of Medical Science Research Management | VOL. 27

The obstacles facing scientific and medical publishing in Saudi Arabia.
Nasser Alsanea ... Janelle A Vales
Annals of Saudi Medicine | VOL. 34
Nasser Alsanea, et. al.Nasser Alsanea ... Janelle A Vales
01 May 2014
Annals of Saudi Medicine | VOL. 34

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Survey of Bioinformatics Database and Software Usage through Mining the Literature.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one