Gene Cluster Statistics with Gene Families

N Raghupathy,D Durand

doi:10.1093/molbev/msp002

Abstract

Identifying genomic regions that descended from a common ancestor is important for understanding the function and evolution of genomes. In distantly related genomes, clusters of homologous gene pairs are evidence of candidate homologous regions. Demonstrating the statistical significance of such “gene clusters” is an essential component of comparative genomic analyses. However, currently there are no practical statistical tests for gene clusters that model the influence of the number of homologs in each gene family on cluster significance. In this work, we demonstrate empirically that failure to incorporate gene family size in gene cluster statistics results in overestimation of significance, leading to incorrect conclusions. We further present novel analytical methods for estimating gene cluster significance that take gene family size into account. Our methods do not require complete genome data and are suitable for testing individual clusters found in local regions, such as contigs in an unfinished assembly. We consider pairs of regions drawn from the same genome (paralogous clusters), as well as regions drawn from two different genomes (orthologous clusters).Determining cluster significance under general models of gene family size is computationally intractable. By assuming that all gene families are of equal size, we obtain analytical expressions that allow fast approximation of cluster probabilities. We evaluate the accuracy of this approximation by comparing the resulting gene cluster probabilities with cluster probabilities obtained by simulating a realistic, power-law distributed model of gene family size, with parameters inferred from genomic data. Surprisingly, despite the simplicity of the underlying assumption, our method accurately approximates the true cluster probabilities. It slightly overestimates these probabilities, yielding a conservative test. We present additional simulation results indicating the best choice of parameter values for data analysis in genomes of various sizes and illustrate the utility of our methods by applying them to gene clusters recently reported in the literature. Mathematica code to compute cluster probabilities using our methods is available as supplementary material.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Molecular Biology and Evolution	Publication Date: Jan 15, 2009
Citations: 22	License type: cc-by-nc

R Discovery Prime

R Discovery Prime

Gene Cluster Statistics with Gene Families

Abstract

Talk to us

Similar Papers

More From: Molecular Biology and Evolution

Lead the way for us

Similar Papers

Comparative analysis of gene family size provides insight into the adaptive evolution of vertebrates
Yu Meng ... Ruo Lin Yang
Yi chuan = Hereditas | VOL. 41
Yu Meng, et. al.Yu Meng ... Ruo Lin Yang
20 Feb 2019
Yi chuan = Hereditas | VOL. 41

A Bayesian model for gene family evolution
Liang Liu ... Venugopal Kalavacharla
BMC Bioinformatics | VOL. 12
Liang Liu, et. al.Liang Liu ... Venugopal Kalavacharla
01 Nov 2011
BMC Bioinformatics | VOL. 12

Evolution of Conserved Non-Coding Sequences Within the Vertebrate Hox Clusters Through the Two-Round Whole Genome Duplications Revealed by Phylogenetic Footprinting Analysis
Masatoshi Matsunami ... Naruya Saitou
Journal of Molecular Evolution | VOL. 71
Masatoshi Matsunami, et. al.Masatoshi Matsunami ... Naruya Saitou
28 Oct 2010
Journal of Molecular Evolution | VOL. 71

Neocortex expansion is linked to size variations in gene families with chemotaxis, cell-cell signalling and immune response functions in mammals.
Atahualpa Castillo-Morales ... Araxi O Urrutia
Open biology | VOL. 6
Atahualpa Castillo-Morales, et. al.Atahualpa Castillo-Morales ... Araxi O Urrutia
01 Oct 2016
Open biology | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Gene Cluster Statistics with Gene Families

Abstract

Talk to us

Similar Papers

More From: Molecular Biology and Evolution