Getting Started in Text Mining: Part Two

Andrey Rzhetsky,Mark B Gerstein,Michael Seringhaus,Olga G Troyanskaya

doi:10.1371/journal.pcbi.1000411

Andrey Rzhetsky, Mark B Gerstein + Show 2 more

Open Access

https://doi.org/10.1371/journal.pcbi.1000411

Copy DOI

Journal: PLoS computational biology	Publication Date: Jul 31, 2009
Citations: 48	License type: CC BY 4.0

Affiliation: University of Chicago, Yale University

Abstract

Getting Started in Text Mining: Part Two

Highlights

This article is intended to continue where Cohen and Hunter [1] left off in ‘‘Getting Started in Text Mining,’’ an introduction in the January 2008 issue of PLoS Computational Biology which covered the actual mining of text and its digestion into small quanta of computer-manageable information
We focus on the downstream questions scientists can ask using text-mining and literature-mining engines
We begin at the top left of the figure, which shows the process of information retrieval—how we select relevant documents [2]

Summary

Introduction

This article is intended to continue where Cohen and Hunter [1] left off in ‘‘Getting Started in Text Mining,’’ an introduction in the January 2008 issue of PLoS Computational Biology which covered the actual mining of text and its digestion into small quanta of computer-manageable information (http://www.ploscompbiol.org/doi/pcbi.0040020). In this overview of the field, we begin by summarizing the major stages of current text-processing pipelines. Named-entity recognition is closely related to the design of controlled terminologies [6] and ontologies for the annotation of texts and experimental data [7]—a process often requiring a monumental community effort [8].

Results

Conclusion