Abstract

Since the first high-quality eukaryotic genome assemblies became available the large scale analysis of the origin of new genes came into the focus of many studies (Shoja & Zhang, 2006; Zhou et al., 2008). New genes can originate through multiple mechanisms including gene duplication, gene fusion/fission, exon shuffling, retroposition, horizontal gene transfer, and de novo from noncoding sequences (Long et al., 2003). Although initial models proposed that new copies of genes soon become nonfuntional (Nei & Roychoudhury, 1973; Ohno, 1970) it has since been shown for numerous genes that they retain function through creating redundancy, subfunctionalization, and neofunctionalization (Hahn, 2009; Li et al., 2005; Massingham et al., 2001). While de novo origination from noncoding sequence has been shown to play an unexpectedly important role (Zhou et al., 2008) most of the new genes are derived through duplications. Gene duplicates are normally classified into dispersed and tandem duplicates. Tandem duplications of clusters of genes, single genes, groups of exons, or single exons are thought to be formed by unequal crossing-over events, or misaligned homologous recombinational repair (Babushok et al., 2007; Zhang, 2003). A comparative analysis of the human, mouse, and rat genome has shown that about 15 % of all genes represent tandemly arrayed genes (Shoja & Zhang, 2006). A similar number of about 20 % has been found for the fruit fly Drosophila melanogaster (Quijano et al., 2008). All these analyses rely on the particular dataset of annotated genes used and the specific methods for defining genes as tandem genes. However, first annotations of genomes are in most cases done by automatic gene prediction programs, nowadays often supported by incorporating additional EST data, and therefore miss many genes, include artificially fused neighbouring genes, and contain mis-predicted exons and introns. Although these errors seem small, in the case of distinguishing tandem gene duplicates from genomic region duplication and trans-spliced genes they are essential. In addition, defining tandem genes by a certain number of nucleotides appearing in-between cannot separate tandem gene duplicates from duplications of small genomic regions. Tandemly arrayed gene duplicates are often conserved between species. Examples are the olfactory receptor genes that constitute a very large gene family of several hundred genes per species in vertebrates (Aloni et al., 2006) and the HOX genes (Garcia-Fernandez, 2005; Zhang & Nei, 1996). While algorithms have been developed to reconstruct the history and evolution of tandemly arrayed genes (Bertrand et al., 2008; Elemento et al., 2002) specific programs are not available for the prediction and local reconstruction of these gene arrays.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.