Integrating Text Mining into the MGI Biocuration Workflow

Karen Dowell,David Hill,Judith Blake,Harold Drabkin,Monica Mcandrews-Hill

doi:10.1038/npre.2009.3262.1

Karen Dowell, David Hill + Show 3 more

Open Access

https://doi.org/10.1038/npre.2009.3262.1

Copy DOI

Journal: Nature Precedings	Publication Date: May 20, 2009
Citations: 1	License type: CC BY 3.0

Affiliation: University of Maine, Jackson Laboratory

Abstract

AbstractA major challenge for the development of resources for functional and comparative genomics is the extraction of data from the biomedical literature. Although text retrieval and extraction for biological data is an active research field, few applications have been integrated into production literature curation systems such as those of the model organism databases.In September 2008, Mouse Genome Informatics (MGI) at The Jackson Lab initiated a search for dictionary-based text mining tools that we could integrate into our curation workflow. MGI has rigorous document triage and annotation procedures designed to identify articles about mouse genome biology and determine whether those articles should be curated. We currently screens approximately 1000 journal articles a month for Gene Ontology terms, gene mapping, gene expression, phenotype data and other key biological information. Although we don’t foresee that human curation tasks can be fully automated in the near future, we are eager to implement entity name recognition and gene tagging tools that can help streamline our curation workflow and simplify gene indexing tasks in the MGI system. In this presentation, we discuss our search process and the steps we took to identify a short list of potential tools for further evaluation. We present our performance metrics and success criteria, and pilot projects in progress. The primary applications under current review are Fraunhofer SCAI’s ProMiner and NCBO’s Open-Biomedical Annotator.

Highlights

Gather info & ideas from MGI staff Consult with text mining experts Develop dictionaries and performance metrics
Direct annotations are created from raw text according to a dictionary that uses terms from a set of ontologies
On a cluster of 16 processors, ProMiner can search the entire MEDLINE literature base with 1 dictionary in ~2 hours

Summary

Evaluate and Decide

Gather info & ideas from MGI staff Consult with text mining experts Develop dictionaries and performance metrics. Develop and format corpus of articles to test Test, evaluate, review, repeat Present results to GO team and refine specs. Identify most promising systems and tools Test, evaluate, tailor, review, repeat. Present short list and project results to MGI Solicit feedback and ideas Decide steps. Product of a BioCreative 2 challenge and BioCreative MetaServer annotation server. Open-Access Web Service Processes up to 3000 words (plain text only) Retrieves article abstracts by PMID. Cheng-Ju Kuo, Institute of Information Science, Academia Sinica, Taiwan

Sample Semantically Expanded Annotation

Issues to address

System requirements

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Integrating Text Mining into the MGI Biocuration Workflow

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Nature Precedings

Lead the way for us

Similar Papers

Integrating Text Mining into the MGI Biocuration Workflow
Karen Dowell ... Monica Mcandrews-Hill
Nature Precedings | VOL. -
Karen Dowell, et. al.Karen Dowell ... Monica Mcandrews-Hill
20 May 2009
Nature Precedings | VOL. -

Integrating text mining into the MGI biocuration workflow
K.G Dowell ... J.A Blake
Database | VOL. 2009
K.G Dowell, et. al.K.G Dowell ... J.A Blake
20 Nov 2009
Database | VOL. 2009

Murine allele and transgene symbols: ensuring unique, concise, and informative nomenclature
M N Perry ... C L Smith
Mammalian Genome | VOL. 33
M N Perry, et. al.M N Perry ... C L Smith
14 Aug 2021
Mammalian Genome | VOL. 33

From Genotype to Phenotype in the Mouse Genome Informatics (MGI) Database: Integrating Quantitative Trait Loci with the Annotated Mouse Genome
Ira Lu ...
The FASEB Journal | VOL. 20
Ira Lu, et. al.Ira Lu ...
01 Mar 2006
The FASEB Journal | VOL. 20

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Integrating Text Mining into the MGI Biocuration Workflow

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Nature Precedings