PubRunner: A light-weight framework for updating text mining results

Kishore R Anekalla,Ben Busby,Jake Lever,Michael Muchow,Nicolas Fiorini,J.P Courneya

doi:10.12688/f1000research.11389.2

Abstract

Biomedical text mining promises to assist biologists in quickly navigating the combined knowledge in their domain. This would allow improved understanding of the complex interactions within biological systems and faster hypothesis generation. New biomedical research articles are published daily and text mining tools are only as good as the corpus from which they work. Many text mining tools are underused because their results are static and do not reflect the constantly expanding knowledge in the field. In order for biomedical text mining to become an indispensable tool used by researchers, this problem must be addressed. To this end, we present PubRunner, a framework for regularly running text mining tools on the latest publications. PubRunner is lightweight, simple to use, and can be integrated with an existing text mining tool. The workflow involves downloading the latest abstracts from PubMed, executing a user-defined tool, pushing the resulting data to a public FTP or Zenodo dataset, and publicizing the location of these results on the public PubRunner website. We illustrate the use of this tool by re-running the commonly used word2vec tool on the latest PubMed abstracts to generate up-to-date word vector representations for the biomedical domain. This shows a proof of concept that we hope will encourage text mining developers to build tools that truly will aid biologists in exploring the latest publications.

Highlights

The National Library of Medicine’s (NLM) PubMed database contains over 27 million citations and is growing exponentially (Lu, 2011)
To encourage biomedical text mining researchers to widely share their results and code, and keep analyses up-to-date, we present PubRunner
It wraps around a text mining tool and manages regular updates using the latest publications from PubMed

Summary

13 Oct 2017

PubRunner can upload data to Zenodo which is a data repository designed for very large datasets to encourage open science This will allow the output of text mining tools to be kept publicly available permanently. This data can be used for interesting analysis on term similarity or as a useful input to other machine learning algorithms (Mehryary et al.) This resource is valuable to the biomedical community, requires substantial compute and storage to create (which may be outside the capability of smaller research groups), and is a good example of a resource that should be kept up-to-date. We hope this shows that PubRunner can be used with real text mining tools and the test cases that we had previously shown.

Introduction

Methods

Conclusions and next steps

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: F1000Research	Publication Date: Oct 13, 2017
Citations: 6	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

PubRunner: A light-weight framework for updating text mining results

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: F1000Research

Lead the way for us

Similar Papers

PubRunner: A light-weight framework for updating text mining results.
Kishore R Anekalla ... Ben Busby
F1000Research | VOL. 6
Kishore R Anekalla, et. al.Kishore R Anekalla ... Ben Busby
02 May 2017
F1000Research | VOL. 6

A Variety of Text Mining Technology and Tools Research
Jie Lian ... Zhili Pei
-
Jie Lian, et. al.Jie Lian ... Zhili Pei
01 Jan 2014
01 Jan 2014

Managing biological networks by using text mining and computer-aided curation
Seok Jong Yu ... Yongseong Cho
Journal of the Korean Physical Society | VOL. 67
Seok Jong Yu, et. al.Seok Jong Yu ... Yongseong Cho
01 Nov 2015
Journal of the Korean Physical Society | VOL. 67

Getting started in text mining.
K Bretonnel Cohen ... Lawrence Hunter
PLoS Computational Biology | VOL. 4
K Bretonnel Cohen, et. al.K Bretonnel Cohen ... Lawrence Hunter
01 Jan 2008
PLoS Computational Biology | VOL. 4

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

PubRunner: A light-weight framework for updating text mining results

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: F1000Research