Abstract

The Rat Genome Database (RGD) is the premier repository of rat genomic, genetic and physiologic data. Converting data from free text in the scientific literature to a structured format is one of the main tasks of all model organism databases. RGD spends considerable effort manually curating gene, Quantitative Trait Locus (QTL) and strain information. The rapidly growing volume of biomedical literature and the active research in the biological natural language processing (bioNLP) community have given RGD the impetus to adopt text-mining tools to improve curation efficiency. Recently, RGD has initiated a project to use OntoMate, an ontology-driven, concept-based literature search engine developed at RGD, as a replacement for the PubMed (http://www.ncbi.nlm.nih.gov/pubmed) search engine in the gene curation workflow. OntoMate tags abstracts with gene names, gene mutations, organism name and most of the 16 ontologies/vocabularies used at RGD. All terms/ entities tagged to an abstract are listed with the abstract in the search results. All listed terms are linked both to data entry boxes and a term browser in the curation tool. OntoMate also provides user-activated filters for species, date and other parameters relevant to the literature search. Using the system for literature search and import has streamlined the process compared to using PubMed. The system was built with a scalable and open architecture, including features specifically designed to accelerate the RGD gene curation process. With the use of bioNLP tools, RGD has added more automation to its curation workflow.Database URL: http://rgd.mcw.edu

Highlights

  • The Rat Genome Database (RGD, http://rgd.mcw.edu) has always looked for ways to improve curation efficiency by making use of software tools

  • As the initial effort to integrate text-mining tools into RGD’s curation workflow, we created an interface between OntoMate and RGD’s curation tool, so that OntoMate could replace PubMed as the literature search engine for gene curation

  • The average number of papers curated per curator per hour has increased from 2.10 to 2.83 after switching to OntoMate

Read more

Summary

Introduction

The Rat Genome Database (RGD, http://rgd.mcw.edu) has always looked for ways to improve curation efficiency by making use of software tools. From 2006 to 2009, the bioinformatics developers at RGD created a tool suite [1] to assist RGD’s curation process. The tools improved a process that originally had been based on spreadsheet data entry. The ontology annotation creation and editing tool serves as a data entry interface for the curation database. RGD biocurators relied on literature searches using PubMed’s interface [2] to locate articles for curation. The curators wanted a search engine which could interface with the gene curation tool

Objectives
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call