Abstract

It is becoming increasingly clear that a significant proportion of the functional sequence within eukaryotic genomes is noncoding. However, since the identification of conserved elements (CEs) has been restricted to a limited number of model organisms, the dynamics and evolutionary character of the genomic landscape of conserved, and hence likely functional, sequence is poorly understood in most species. Moreover, identification and analysis of the full suite of functional sequence are particularly important for the understanding of the genetic basis of trait loci identified in genome scans or quantitative trait locus mapping efforts. We report that ~6.6% of the collared flycatcher genome (74.0Mb) is spanned by ~1.28million CEs, a higher proportion of the genome but a lower total amount of conserved sequence than has been reported in mammals. We identified >200,000 CEs specific to either the archosaur, avian, neoavian or passeridan lineages, constituting candidates for lineage-specific adaptations. Importantly, no less than ~71% of CE sites were nonexonic (52.6Mb), and conserved nonexonic sequence density was negatively correlated with functional exonic density at local genomic scales. Additionally, nucleotide diversity was strongly reduced at nonexonic conserved sites (0.00153) relative to intergenic nonconserved sites (0.00427). By integrating deep transcriptome sequencing and additional genome annotation, we identified novel protein-coding genes, long noncoding RNA genes and transposon-derived (exapted) CEs. The approach taken here based on the use of a progressive cactus whole-genome alignment to identify CEs should be readily applicable to nonmodel organisms in general and help to reveal the rich repertoire of putatively functional noncoding sequence as targets for selection.

Highlights

  • The identification of selectively constrained sites within a genome, and analysis of their character and distribution, is an issue of central importance for the ability to link genotypes with phenotypes

  • It is important to incorporate putatively functional noncoding sequence alongside protein-coding sequence when approximating the density and distribution of genomic targets for selection. While this is currently applied for a limited number of well-studied taxa such as human (Enard, Messer, & Petrov, 2014; Hernandez et al, 2011; Lohmueller et al, 2011; McVicker, Gordon, Davis, & Green, 2009) and Drosophila melanogaster (Comeron, 2014; Elyashiv et al, 2016; Halligan & Keightley, 2006; Sella, Petrov, Przeworski, & Andolfatto, 2009), the necessary annotation of noncoding sequence is typically unavailable for studies of nonmodel organisms

  • Following Lindblad-Toh et al (2011), the intersect between conserved elements (CEs) and the various annotated classes was defined in a hierarchical format, such that if a base within a CE overlapped two or more different classes, it was assigned to a single class based on the first appearance in the following order: coding sequence (CDS), 50 UTR, 30 UTR, promoter, RNA gene, novel CDS, novel UTR, long noncoding RNA (lncRNA), intronic, intergenic

Read more

Summary

| INTRODUCTION

The identification of selectively constrained sites within a genome, and analysis of their character and distribution, is an issue of central importance for the ability to link genotypes with phenotypes. It is important to incorporate putatively functional noncoding sequence alongside protein-coding sequence when approximating the density and distribution of genomic targets for selection. While this is currently applied for a limited number of well-studied taxa such as human (Enard, Messer, & Petrov, 2014; Hernandez et al, 2011; Lohmueller et al, 2011; McVicker, Gordon, Davis, & Green, 2009) and Drosophila melanogaster (Comeron, 2014; Elyashiv et al, 2016; Halligan & Keightley, 2006; Sella, Petrov, Przeworski, & Andolfatto, 2009), the necessary annotation of noncoding sequence is typically unavailable for studies of nonmodel organisms. We further incorporate deep transcriptome sequencing and TE annotation to improve genome annotation and identify thousands of CEs putatively derived from TE exaptation events

| MATERIAL AND METHODS
| RESULTS
| DISCUSSION
Findings
| CONCLUSIONS
DATA ACCESSIBILITY
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call