Abstract

The identification of single copy (1-to-1) orthologs in any group of organisms is important for functional classification and phylogenetic studies. The Metazoa are no exception, but only recently has there been a wide-enough distribution of taxa with sufficiently high quality sequenced genomes to gain confidence in the wide-spread single copy status of a gene.Here, we present a phylogenetic approach for identifying overlooked single copy orthologs from multigene families and apply it to the Metazoa. Using 18 sequenced metazoan genomes of high quality we identified a robust set of 1,126 orthologous groups that have been retained in single copy since the last common ancestor of Metazoa. We found that the use of the phylogenetic procedure increased the number of single copy orthologs found by over a third more than standard taxon-count approaches. The orthologs represented a wide range of functional categories, expression profiles and levels of divergence.To demonstrate the value of our set of single copy orthologs, we used them to assess the completeness of 24 currently published metazoan genomes and 62 EST datasets. We found that the annotated genes in published genomes vary in coverage from 79% (Ciona intestinalis) to 99.8% (human) with an average of 92%, suggesting a value for the underlying error rate in genome annotation, and a strategy for identifying single copy orthologs in larger datasets. In contrast, the vast majority of EST datasets with no corresponding genome sequence available are largely under-sampled and probably do not accurately represent the actual genomic complement of the organisms from which they are derived.

Highlights

  • Not long after the release of the first bacterial genome sequence [1], large-scale identification of gene families from multiple organisms became feasible [2,3,4,5] and allowed them to be classified into groups according to their homologous relationships [6]

  • Since defining a clear 1-to-1 relationship between two genes is sometimes complex, operational orthologous groups have been introduced [7] that allow difficult cases to be resolved, these groups depend on the genomes and taxonomic levels used to derive the respective gene sets [6]

  • By comparing our orthologs to those predicted by other datasets we show that our procedure identifies a significantly larger set of single copy orthologs in the Metazoa

Read more

Summary

Introduction

Not long after the release of the first bacterial genome sequence [1], large-scale identification of gene families from multiple organisms became feasible [2,3,4,5] and allowed them to be classified into groups according to their homologous relationships [6]. Since defining a clear 1-to-1 relationship between two genes is sometimes complex, operational orthologous groups have been introduced [7] that allow difficult cases to be resolved, these groups depend on the genomes and taxonomic levels used to derive the respective gene sets [6]. This is illustrated nicely with an example from the eggNOG database version 1 (evolutionary genealogy of genes: Non-supervised Orthologous Groups) [14] which groups genes into families at different taxonomic levels balancing phylogenetic coverage and resolution. The definition of genes in a pair of species as single copy orthologs implies that they have kept this status since the species last shared a common ancestor [20]

Author Summary
Conclusions
Findings
Materials and Methods
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call