Abstract
Evolution provides the unifying framework with which to understand biology. The coherent investigation of genic and genomic data often requires comparative genomics analyses based on whole-genome alignments, sets of homologous genes and other relevant datasets in order to evaluate and answer evolutionary-related questions. However, the complexity and computational requirements of producing such data are substantial: this has led to only a small number of reference resources that are used for most comparative analyses. The Ensembl comparative genomics resources are one such reference set that facilitates comprehensive and reproducible analysis of chordate genome data. Ensembl computes pairwise and multiple whole-genome alignments from which large-scale synteny, per-base conservation scores and constrained elements are obtained. Gene alignments are used to define Ensembl Protein Families, GeneTrees and homologies for both protein-coding and non-coding RNA genes. These resources are updated frequently and have a consistent informatics infrastructure and data presentation across all supported species. Specialized web-based visualizations are also available including synteny displays, collapsible gene tree plots, a gene family locator and different alignment views. The Ensembl comparative genomics infrastructure is extensively reused for the analysis of non-vertebrate species by other projects including Ensembl Genomes and Gramene and much of the information here is relevant to these projects. The consistency of the annotation across species and the focus on vertebrates makes Ensembl an ideal system to perform and support vertebrate comparative genomic analyses. We use robust software and pipelines to produce reference comparative data and make it freely available.Database URL: http://www.ensembl.org.
Highlights
The number of publicly available chordate genomes has been increasing at a fast pace since the publication of the human genome sequence [1, 2] and is expected to increase further in the coming years due to continuous advances in sequencing technologies
We have previously described our algorithm for producing protein-coding orthology and paralogy annotations [12] as well as the algorithms used to create our whole genome multiple alignments [26, 27]
The Ensembl comparative genomics infrastructure has been developed for the analysis of the chordate genomes present in Ensembl it has been successfully used for other clades such as plants [25] and bacteria [24]
Summary
The number of publicly available chordate genomes has been increasing at a fast pace since the publication of the human genome sequence [1, 2] and is expected to increase further in the coming years due to continuous advances in sequencing technologies. Comparative analysis is such an important tool to better characterize genomes that a set of 29 mammalian genomes, including 22 sequenced for the project, were analysed together as a means to understand the human genome [3]. Comparative genomics analyses can focus on the similarity and differences between the annotation or between the sequence of two or more genomes. Pairwise and multiple whole-genome alignments are used to compare genome sequences. Pairs of genes can be annotated as orthologues or paralogues [7]. Despite recent concerns on the orthology conjecture [8], orthologues tend to be more similar in function than paralogues [9] and are widely used in gene annotation [10, 11]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.