Abstract

AbstractA major challenge for the development of resources for functional and comparative genomics is the extraction of data from the biomedical literature. Although text retrieval and extraction for biological data is an active research field, few applications have been integrated into production literature curation systems such as those of the model organism databases.In September 2008, Mouse Genome Informatics (MGI) at The Jackson Lab initiated a search for dictionary-based text mining tools that we could integrate into our curation workflow. MGI has rigorous document triage and annotation procedures designed to identify articles about mouse genome biology and determine whether those articles should be curated. We currently screens approximately 1000 journal articles a month for Gene Ontology terms, gene mapping, gene expression, phenotype data and other key biological information. Although we don’t foresee that human curation tasks can be fully automated in the near future, we are eager to implement entity name recognition and gene tagging tools that can help streamline our curation workflow and simplify gene indexing tasks in the MGI system. In this presentation, we discuss our search process and the steps we took to identify a short list of potential tools for further evaluation. We present our performance metrics and success criteria, and pilot projects in progress. The primary applications under current review are Fraunhofer SCAI’s ProMiner and NCBO’s Open-Biomedical Annotator.

Highlights

  • Gather info & ideas from MGI staff Consult with text mining experts Develop dictionaries and performance metrics

  • Direct annotations are created from raw text according to a dictionary that uses terms from a set of ontologies

  • On a cluster of 16 processors, ProMiner can search the entire MEDLINE literature base with 1 dictionary in ~2 hours

Read more

Summary

Evaluate and Decide

Gather info & ideas from MGI staff Consult with text mining experts Develop dictionaries and performance metrics. Develop and format corpus of articles to test Test, evaluate, review, repeat Present results to GO team and refine specs. Identify most promising systems and tools Test, evaluate, tailor, review, repeat. Present short list and project results to MGI Solicit feedback and ideas Decide steps. Product of a BioCreative 2 challenge and BioCreative MetaServer annotation server. Open-Access Web Service Processes up to 3000 words (plain text only) Retrieves article abstracts by PMID. Cheng-Ju Kuo, Institute of Information Science, Academia Sinica, Taiwan

Sample Semantically Expanded Annotation
Issues to address
System requirements
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.