Abstract

By identifying genomic sequence regions conserved among several species, comparative genomics offers opportunities to discover putatively functional elements without any prior knowledge of what these functions might be. Comparative analyses across mammals estimated 4–5% of the human genome to be functionally constrained, a much larger fraction than the 1–2% occupied by annotated protein-coding or RNA genes. Such functionally constrained yet unannotated regions have been referred to as conserved non-coding sequences (CNCs) or ultra-conserved elements (UCEs), which remain largely uncharacterized but probably form a highly heterogeneous group of elements including enhancers, promoters, motifs, and others. To facilitate the study of such CNCs/UCEs, we present our resource of Conserved Elements from Genomic Alignments (CEGA), accessible from http://cega.ezlab.org. Harnessing the power of multiple species comparisons to detect genomic elements under purifying selection, CEGA provides a comprehensive set of CNCs identified at different radiations along the vertebrate lineage. Evolutionary constraint is identified using threshold-free phylogenetic modeling of unbiased and sensitive global alignments of genomic synteny blocks identified using protein orthology. We identified CNCs independently for five vertebrate clades, each referring to a different last common ancestor and therefore to an overlapping but varying set of CNCs with 24 488 in vertebrates, 241 575 in amniotes, 709 743 in Eutheria, 642 701 in Boreoeutheria and 612 364 in Euarchontoglires, spanning from 6 Mbp in vertebrates to 119 Mbp in Euarchontoglires. The dynamic CEGA web interface displays alignments, genomic locations, as well as biologically relevant data to help prioritize and select CNCs of interest for further functional investigations.

Highlights

  • Hypothesizing on gene functions is instrumental for many studies in molecular biology

  • In 2016 OrthoDB reached its 9th release, growing to over 22 million genes from over 5000 species, adding plants, archaea and viruses. In this update we focused on usability of this fast-growing wealth of data: updating the user and programmatic interfaces to browse and query the data, and further enhancing the already extensive integration of available gene functional annotations

  • As the generation of sequencing data grows much faster than experimental interrogations of gene functions, orthology is the best way to link the knowledge acquired in model organisms to a much wider scope of genomics [2]

Read more

Summary

Introduction

Hypothesizing on gene functions is instrumental for many studies in molecular biology. OrthoDB continues to provide computed evolutionary annotations and to allow user queries by sequence homology. The web resource presenting the OrthoDB data enables identified user sessions to analyze custom data sets in the context of the available orthology data, as well as to generate publication quality comparative genomics reports.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call