What the papers say: text mining for genomics and systems biology.

Michael P H Stumpf,Wendy Filsell,Nathan Harmston

doi:10.1186/1479-7364-5-1-17

Michael P H Stumpf, Wendy Filsell + Show 1 more

Open Access

https://doi.org/10.1186/1479-7364-5-1-17

Copy DOI

Abstract

Keeping up with the rapidly growing literature has become virtually impossible for most scientists. This can have dire consequences. First, we may waste research time and resources on reinventing the wheel simply because we can no longer maintain a reliable grasp on the published literature. Second, and perhaps more detrimental, judicious (or serendipitous) combination of knowledge from different scientific disciplines, which would require following disparate and distinct research literatures, is rapidly becoming impossible for even the most ardent readers of research publications. Text mining -- the automated extraction of information from (electronically) published sources -- could potentially fulfil an important role -- but only if we know how to harness its strengths and overcome its weaknesses. As we do not expect that the rate at which scientific results are published will decrease, text mining tools are now becoming essential in order to cope with, and derive maximum benefit from, this information explosion. In genomics, this is particularly pressing as more and more rare disease-causing variants are found and need to be understood. Not being conversant with this technology may put scientists and biomedical regulators at a severe disadvantage. In this review, we introduce the basic concepts underlying modern text mining and its applications in genomics and systems biology. We hope that this review will serve three purposes: (i) to provide a timely and useful overview of the current status of this field, including a survey of present challenges; (ii) to enable researchers to decide how and when to apply text mining tools in their own research; and (iii) to highlight how the research communities in genomics and systems biology can help to make text mining from biomedical abstracts and texts more straightforward.

Highlights

The scientific literature provides an important source of knowledge generated by the research community; it does not become defunct five years after publication and it is not just something to promote the authors’ careers
This has an impact on their ability to generate meaningful and testable hypotheses, with some even suggesting that this is becoming a bottleneck in the scientific discovery process.[4]
Only a relatively small number of papers are available for full-text mining and so most work is restricted to abstracts and titles, which are freely available from MEDLINE (only 30 per cent of curated protein –protein interactions (PPIs) can be found in the abstracts rather than the full text9)

Summary

Introduction

The scientific literature provides an important source of knowledge generated by the research community; it does not become defunct five years after publication and it is not just something to promote the authors’ careers. While large amounts of data relating to biological systems are stored in public repositories, an even larger amount can be found in a semi-structured form in the literature (see Figure 1) This knowledge is potentially very useful in a variety of genomics and systems biology contexts.[1] For example, manually curated and literature-derived protein-protein interaction datasets are typically used as gold standards by the systems biology community and it is standard practice to extract parameters for mechanistic models from the literature. The increase in the numbers of papers being published means that it is becoming harder for researchers to stay up to date with the relevant literature in their field This has an impact on their ability to generate meaningful and testable hypotheses, with some even suggesting that this is becoming a bottleneck in the scientific discovery process.[4].

Part of speech

BioCreative II FT

Entity normalisation

Relation extraction

Finding new applications for genetic algorithms using the WWW

Findings

Conclusion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Human genomics	Publication Date: Jan 1, 2010
Citations: 51	License type: cc-by

R Discovery Prime

R Discovery Prime

What the papers say: text mining for genomics and systems biology.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Human genomics

Lead the way for us

Similar Papers

A Variety of Text Mining Technology and Tools Research
Xiaojing Fan ... Xinhong Zhang
-
Xiaojing Fan, et. al.Xiaojing Fan ... Xinhong Zhang
01 Jan 2014
01 Jan 2014

IProLINK: A Framework for Linking Text Mining with Ontology and Systems Biology
Cathy H Wu ... K Bretonnel Cohen
-
Cathy H Wu, et. al.Cathy H Wu ... K Bretonnel Cohen
01 Jan 2008
01 Jan 2008

Machine learning in bioinformatics
Zhaoli
-
Zhaoli Zhaoli
01 Dec 2011
01 Dec 2011

Machine learning in bioinformatics
Iñaki Inza ... Victor Robles
Briefings in Bioinformatics | VOL. 7
Iñaki Inza, et. al.Iñaki Inza ... Victor Robles
01 Mar 2006
Briefings in Bioinformatics | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

What the papers say: text mining for genomics and systems biology.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Human genomics