Design of a generic, open platform for machine learning-assisted indexing and clustering of articles in PubMed, a biomedical bibliographic database

Neil R Smalheiser,Aaron M Cohen

doi:10.2478/dim-2018-0004

Neil R Smalheiser, Aaron M Cohen

Open Access

https://doi.org/10.2478/dim-2018-0004

Copy DOI

Abstract

AbstractMany investigators have carried out text mining of the biomedical literature for a variety of purposes, ranging from the assignment of indexing terms to the disambiguation of author names. A common approach is to define positive and negative training examples, extract features from article metadata, and use machine learning algorithms. At present, each research group tackles each problem from scratch, in isolation of other projects, which causes redundancy and a great waste of effort. Here, we propose and describe the design of a generic platform for biomedical text mining, which can serve as a shared resource for machine learning projects and as a public repository for their outputs. We initially focus on a specific goal, namely, classifying articles according to publication type and emphasize how feature sets can be made more powerful and robust through the use of multiple, heterogeneous similarity measures as input to machine learning models. We then discuss how the generic platform can be extended to include a wide variety of other machine learning-based goals and projects and can be used as a public platform for disseminating the results of natural language processing (NLP) tools to end-users as well.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Data and information management	Publication Date: Jun 1, 2018
Citations: 1	License type: CC BY-NC-ND 3.0

R Discovery Prime

R Discovery Prime

Design of a generic, open platform for machine learning-assisted indexing and clustering of articles in PubMed, a biomedical bibliographic database

Abstract

Talk to us

Similar Papers

More From: Data and information management

Lead the way for us

Similar Papers

Using neural networks and evolutionary information in decoy discrimination for protein tertiary structure prediction
Ching-Wai Tan ... David T Jones
BMC Bioinformatics | VOL. 9
Ching-Wai Tan, et. al.Ching-Wai Tan ... David T Jones
11 Feb 2008
BMC Bioinformatics | VOL. 9

Influence of Varying Training Set Composition and Size on Support Vector Machine-Based Prediction of Active Compounds
Raquel Rodríguez-Pérez ... Martin Vogt
Journal of Chemical Information and Modeling | VOL. 57
Raquel Rodríguez-Pérez, et. al.Raquel Rodríguez-Pérez ... Martin Vogt
10 Apr 2017
Journal of Chemical Information and Modeling | VOL. 57

Topics in machine learning for biomedical literature analysis and text retrieval
Rezarta Islamaj Doğan ... Lana Yeganova
BMC Bioinformatics | VOL. 12
Rezarta Islamaj Doğan, et. al.Rezarta Islamaj Doğan ... Lana Yeganova
09 Jun 2011
BMC Bioinformatics | VOL. 12

Learning to Find Relevant Biological Articles without Negative Training Examples
Keith Noto ... Charles Elkan
-
Keith Noto, et. al.Keith Noto ... Charles Elkan
01 Jan 2008
01 Jan 2008

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Design of a generic, open platform for machine learning-assisted indexing and clustering of articles in PubMed, a biomedical bibliographic database

Abstract

Talk to us

Similar Papers

More From: Data and information management