Abstract

BackgroundWith the biomedical literature continually expanding, searching PubMed for information about specific genes becomes increasingly difficult. Not only can thousands of results be returned, but gene name ambiguity leads to many irrelevant hits. As a result, it is difficult for life scientists and gene curators to rapidly get an overall picture about a specific gene from documents that mention its names and synonyms.ResultsIn this paper, we present eGIFT (http://biotm.cis.udel.edu/eGIFT), a web-based tool that associates informative terms, called iTerms, and sentences containing them, with genes. To associate iTerms with a gene, eGIFT ranks iTerms about the gene, based on a score which compares the frequency of occurrence of a term in the gene's literature to its frequency of occurrence in documents about genes in general. To retrieve a gene's documents (Medline abstracts), eGIFT considers all gene names, aliases, and synonyms. Since many of the gene names can be ambiguous, eGIFT applies a disambiguation step to remove matches that do not correspond to this gene. Another additional filtering process is applied to retain those abstracts that focus on the gene rather than mention it in passing. eGIFT's information for a gene is pre-computed and users of eGIFT can search for genes by using a name or an EntrezGene identifier. iTerms are grouped into different categories to facilitate a quick inspection. eGIFT also links an iTerm to sentences mentioning the term to allow users to see the relation between the iTerm and the gene. We evaluated the precision and recall of eGIFT's iTerms for 40 genes; between 88% and 94% of the iTerms were marked as salient by our evaluators, and 94% of the UniProtKB keywords for these genes were also identified by eGIFT as iTerms.ConclusionsOur evaluations suggest that iTerms capture highly-relevant aspects of genes. Furthermore, by showing sentences containing these terms, eGIFT can provide a quick description of a specific gene. eGIFT helps not only life scientists survey results of high-throughput experiments, but also annotators to find articles describing gene aspects and functions.

Highlights

  • With the biomedical literature continually expanding, searching PubMed for information about specific genes becomes increasingly difficult

  • Experiment 1: Salience of iTerms (Precision) A major objective of eGIFT is to assist a researcher in recognizing the biomedical and molecular properties associated with a gene

  • A user can consult a list of iTerms extracted by eGIFT and examine the sentences associated with these iTerms to quickly identify important concepts and how they are related to the gene

Read more

Summary

Introduction

With the biomedical literature continually expanding, searching PubMed for information about specific genes becomes increasingly difficult. Can thousands of results be returned, but gene name ambiguity leads to many irrelevant hits As a result, it is difficult for life scientists and gene curators to rapidly get an overall picture about a specific gene from documents that mention its names and synonyms. EGIFT extracts terms for a gene by comparing their frequencies in a set of a gene’s documents with their frequencies in a background set. These terms, which we call iTerms (informative terms) provide a biologist with a synoptic understanding of a gene. These terms, which we call iTerms (informative terms) provide a biologist with a synoptic understanding of a gene. iTerms are directly linked to sentences in the gene’s abstracts that help a biologist better place them in a biological context

Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.