Genome-wide computational prediction of tandem gene arrays: application in yeasts

Laurence Despons,Pascal Durrens,Jean-Luc Souciet,Lionel Frangeul,Philippe V Baret,Véronique Louis

doi:10.1186/1471-2164-11-56

Abstract

BackgroundThis paper describes an efficient in silico method for detecting tandem gene arrays (TGAs) in fully sequenced and compact genomes such as those of prokaryotes or unicellular eukaryotes. The originality of this method lies in the search of protein sequence similarities in the vicinity of each coding sequence, which allows the prediction of tandem duplicated gene copies independently of their functionality.ResultsApplied to nine hemiascomycete yeast genomes, this method predicts that 2% of the genes are involved in TGAs and gene relics are present in 11% of TGAs. The frequency of TGAs with degenerated gene copies means that a significant fraction of tandem duplicated genes follows the birth-and-death model of evolution. A comparison of sequence identity distributions between sets of homologous gene pairs shows that the different copies of tandem arrayed paralogs are less divergent than copies of dispersed paralogs in yeast genomes. It suggests that paralogs included in tandem structures are more recent or more subject to the gene conversion mechanism than other paralogs.ConclusionThe method reported here is a useful computational tool to provide a database of TGAs composed of functional or nonfunctional gene copies. Such a database has obvious applications in the fields of structural and comparative genomics. Notably, a detailed study of the TGA catalog will make it possible to tackle the fundamental questions of the origin and evolution of tandem gene clusters.

Highlights

This paper describes an efficient in silico method for detecting tandem gene arrays (TGAs) in fully sequenced and compact genomes such as those of prokaryotes or unicellular eukaryotes
When several high-scoring segment pairs (HSPs) were obtained between a given coding sequences (CDSs) protein sequence and one of its flanking region, the total TB score was calculated as the sum of bit scores of all HSPs that correspond to the same strand and do not overlap by more than 20%
TGAs would be breeding grounds for new genes. We will address these interesting questions with a detailed analysis of the TGA catalog produced by our computational prediction method from hemiascomycete yeast genomes

Summary

Introduction

This paper describes an efficient in silico method for detecting tandem gene arrays (TGAs) in fully sequenced and compact genomes such as those of prokaryotes or unicellular eukaryotes. The originality of this method lies in the search of protein sequence similarities in the vicinity of each coding sequence, which allows the prediction of tandem duplicated gene copies independently of their functionality. Different methods were used to achieve a systematic characterization of TGAs in eukaryotic genomes All these methods derive from those primarily used to identify any duplicate genes in complete genomes [1,2,3] and take into account supplementary data concerning the chromosomal location of the detected duplicate genes.

Methods

Results

Discussion

Conclusion