BioReader: a text mining tool for performing classification of biomedical literature

Christian Simon,Kristian Davidsen,Christina Hansen,Mike Bogetofte Barnkob,Emily Seymour,Lars Rønn Olsen

doi:10.1186/s12859-019-2607-x

Abstract

BackgroundScientific data and research results are being published at an unprecedented rate. Many database curators and researchers utilize data and information from the primary literature to populate databases, form hypotheses, or as the basis for analyses or validation of results. These efforts largely rely on manual literature surveys for collection of these data, and while querying the vast amounts of literature using keywords is enabled by repositories such as PubMed, filtering relevant articles from such query results can be a non-trivial and highly time consuming task.ResultsWe here present a tool that enables users to perform classification of scientific literature by text mining-based classification of article abstracts. BioReader (Biomedical Research Article Distiller) is trained by uploading article corpora for two training categories - e.g. one positive and one negative for content of interest - as well as one corpus of abstracts to be classified and/or a search string to query PubMed for articles. The corpora are submitted as lists of PubMed IDs and the abstracts are automatically downloaded from PubMed, preprocessed, and the unclassified corpus is classified using the best performing classification algorithm out of ten implemented algorithms.ConclusionBioReader supports data and information collection by implementing text mining-based classification of primary biomedical literature in a web interface, thus enabling curators and researchers to take advantage of the vast amounts of data and information in the published literature. BioReader outperforms existing tools with similar functionalities and expands the features used for mining literature in database curation efforts. The tool is freely available as a web service at http://www.cbs.dtu.dk/services/BioReader

Highlights

Scientific data and research results are being published at an unprecedented rate
BioReader outperforms existing tools with similar functionalities and expands the features used for mining literature in database curation efforts
Projects are based on manual curation of databases assembled by extraction of data and information from the primary literature to compile highly useful databases, including MetaCyc – a curated database of experimentally elucidated metabolic pathways [2], the Immune Epitope Database (IEDB) [3], and the Tumor T cell Antigen database [4]

Summary

Results

We here present a tool that enables users to perform classification of scientific literature by text mining-based classification of article abstracts. BioReader (Biomedical Research Article Distiller) is trained by uploading article corpora for two training categories - e.g. one positive and one negative for content of interest - as well as one corpus of abstracts to be classified and/or a search string to query PubMed for articles. The corpora are submitted as lists of PubMed IDs and the abstracts are automatically downloaded from PubMed, preprocessed, and the unclassified corpus is classified using the best performing classification algorithm out of ten implemented algorithms

Conclusion

Background

Results and discussion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Feb 1, 2019
Citations: 57	License type: open-access

R Discovery Prime

R Discovery Prime

BioReader: a text mining tool for performing classification of biomedical literature

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Investigation into biomedical literature classification using support vector machines
N Polavarapu ... S Sahay
-
N Polavarapu, et. al.N Polavarapu ... S Sahay
01 Jan 2004
01 Jan 2004

A web services choreography scenario for interoperating bioinformatics applications
Remko De Knikker ... Jin-Long Li
BMC bioinformatics | VOL. 5
Remko De Knikker, et. al.Remko De Knikker ... Jin-Long Li
01 Jan 2004
BMC bioinformatics | VOL. 5

CORDITE: The Curated CORona Drug InTERactions Database for SARS-CoV-2.
Roman Martin ... Georges Hattab
iScience | VOL. 23
Roman Martin, et. al.Roman Martin ... Georges Hattab
20 Jun 2020
iScience | VOL. 23

Classification of Biomedical Literature in Hypertension and Diabetes
Nur Aniq Syafiq Rodzuan ... Mohanavali Sithambranathan
International Journal of Data Science | VOL. 1
Nur Aniq Syafiq Rodzuan, et. al.Nur Aniq Syafiq Rodzuan ... Mohanavali Sithambranathan
14 Aug 2020
International Journal of Data Science | VOL. 1

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

BioReader: a text mining tool for performing classification of biomedical literature

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics