Abstract

Arrangements of genes along chromosomes are a product of evolutionary processes, and we can expect that preferable arrangements will prevail over the span of evolutionary time, often being reflected in the non-random clustering of structurally and/or functionally related genes. Such non-random arrangements can arise by two distinct evolutionary processes: duplications of DNA sequences that give rise to clusters of genes sharing both sequence similarity and common sequence features and the migration together of genes related by function, but not by common descent [1], [2], [3]. To provide a background for distinguishing between the two, which is important for future efforts to unravel the evolutionary processes involved, we here provide a description of the extent to which ancestrally related genes are found in proximity.Towards this purpose, we combined information from five genomic datasets, InterPro, SCOP, PANTHER, Ensembl protein families, and Ensembl gene paralogs. The results are provided in publicly available datasets (http://cgd.jax.org/datasets/clustering/paraclustering.shtml) describing the extent to which ancestrally related genes are in proximity beyond what is expected by chance (i.e. form paraclusters) in the human and nine other vertebrate genomes, as well as the D. melanogaster, C. elegans, A. thaliana, and S. cerevisiae genomes.With the exception of Saccharomyces, paraclusters are a common feature of the genomes we examined. In the human genome they are estimated to include at least 22% of all protein coding genes. Paraclusters are far more prevalent among some gene families than others, are highly species or clade specific and can evolve rapidly, sometimes in response to environmental cues. Altogether, they account for a large portion of the functional clustering previously reported in several genomes.

Highlights

  • Contemporary arrangements of genes along chromosomes are the products of mutation, viewed as genomic changes at all levels, followed by natural selection, and we can expect that preferable arrangements are likely to prevail over the span of evolutionary time

  • Some have focused on the description of tandemly arrayed genes (TAGs), reporting that an average of 14% of genes in vertebrates are clustered in that way [5], with similar metrics found for A. thaliana [6], but only 2% so clustered in yeast [7]

  • It is apparent that many paralogs are in proximity more frequently than what is expected by chance even when they are not in strictly tandem arrays because they are separated by interspersed, unrelated genes. This is true of the paraclusters whose members share the immunoglobulin superfamily domain defined by SCOP and the immunoglobulin-like domain defined by InterPro

Read more

Summary

Introduction

Contemporary arrangements of genes along chromosomes are the products of mutation, viewed as genomic changes at all levels, followed by natural selection, and we can expect that preferable arrangements are likely to prevail over the span of evolutionary time. Bretaudeau, Sallou,and Lecerf increased the sensitivity of their analysis using modified BLASTP parameters in order to detect domain level sequence similarities and expanded the scope of gene clusters to include all structurally related genes residing within 2.5 MB of each other They reported that among the vertebrates tested, an average of 30% of genes are present in structural clusters (http://dgd.geneoust.org) [8]. This is a acute problem for high frequency domains that are likely to occur near each other by chance, and very likely led to an overestimate of structural clustering In all such studies it is unclear how much information regarding gene clustering is missed due to the high levels of sequence divergence that arises between those ancestrally duplicated genes that have remained in proximity since the common ancestor of all vertebrates and even longer; detecting such arrangements requires the use of more sensitive approaches such as those involving Hidden Markov Models. In this present study our aim is to obtain new metrics and catalogues of clustered genes that both share structural features detected at the domain level and are highly significant in their proximity to one another, while allowing for sequence divergence to extend to the superfamily level

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call