Identifying overrepresented concepts in gene lists from literature: a statistical approach based on Poisson mixture model

Xin He,Xu Ling,Moushumi Sen Sarma,Bruce Schatz,Chengxiang Zhai,Brant Chee

doi:10.1186/1471-2105-11-272

Abstract

BackgroundLarge-scale genomic studies often identify large gene lists, for example, the genes sharing the same expression patterns. The interpretation of these gene lists is generally achieved by extracting concepts overrepresented in the gene lists. This analysis often depends on manual annotation of genes based on controlled vocabularies, in particular, Gene Ontology (GO). However, the annotation of genes is a labor-intensive process; and the vocabularies are generally incomplete, leaving some important biological domains inadequately covered.ResultsWe propose a statistical method that uses the primary literature, i.e. free-text, as the source to perform overrepresentation analysis. The method is based on a statistical framework of mixture model and addresses the methodological flaws in several existing programs. We implemented this method within a literature mining system, BeeSpace, taking advantage of its analysis environment and added features that facilitate the interactive analysis of gene sets. Through experimentation with several datasets, we showed that our program can effectively summarize the important conceptual themes of large gene sets, even when traditional GO-based analysis does not yield informative results.ConclusionsWe conclude that the current work will provide biologists with a tool that effectively complements the existing ones for overrepresentation analysis from genomic experiments. Our program, Genelist Analyzer, is freely available at: http://workerbee.igb.uiuc.edu:8080/BeeSpace/Search.jsp

Highlights

Large-scale genomic studies often identify large gene lists, for example, the genes sharing the same expression patterns
One of the changes associated with the advances in genomic and systems biology is that biologists are no longer limited to studying one gene at a time
We proposed a new method based on a rigorous statistical model to identify overrepresented concepts in gene lists from free-text

Summary

Introduction

Large-scale genomic studies often identify large gene lists, for example, the genes sharing the same expression patterns. The interpretation of these gene lists is generally achieved by extracting concepts overrepresented in the gene lists. This analysis often depends on manual annotation of genes based on controlled vocabularies, in particular, Gene Ontology (GO). The annotation of genes is a labor-intensive process; and the vocabularies are generally incomplete, leaving some important biological domains inadequately covered. The process of annotating genes with some controlled vocabulary requires the efforts of biologist curators, who need to read and digest a large amount of textual information.

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: May 20, 2010
Citations: 46	License type: cc-by

R Discovery Prime

R Discovery Prime

Identifying overrepresented concepts in gene lists from literature: a statistical approach based on Poisson mixture model

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Editor's evaluation: The impact of local genomic properties on the evolutionary fate of genes
Wenfeng Qian
-
Wenfeng QianWenfeng Qian
07 Oct 2022
07 Oct 2022

LEGO: a novel method for gene set over-representation analysis by incorporating network-based gene weights.
Xinran Dong ... Yun Hao
Scientific Reports | VOL. 6
Xinran Dong, et. al.Xinran Dong ... Yun Hao
11 Jan 2016
Scientific Reports | VOL. 6

Pathways of the Heart
Rahul C Deo ... Frederick P Roth
Circulation: Cardiovascular Genetics | VOL. 2
Rahul C Deo, et. al.Rahul C Deo ... Frederick P Roth
01 Aug 2009
Circulation: Cardiovascular Genetics | VOL. 2

A map of cell type-specific auxin responses.
Bastiaan O R Bargmann ... Tal Nawy
Molecular Systems Biology | VOL. 9
Bastiaan O R Bargmann, et. al.Bastiaan O R Bargmann ... Tal Nawy
01 Jan 2013
Molecular Systems Biology | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Identifying overrepresented concepts in gene lists from literature: a statistical approach based on Poisson mixture model

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics