Abstract

It has become clear that a large proportion of functional DNA in the human genome does not code for protein. Identification of this non-coding functional sequence using comparative approaches is proving difficult and has previously been thought to require deep sequencing of multiple vertebrates. Here we introduce a new model and comparative method that, instead of nucleotide substitutions, uses the evolutionary imprint of insertions and deletions (indels) to infer the past consequences of selection. The model predicts the distribution of indels under neutrality, and shows an excellent fit to human–mouse ancestral repeat data. Across the genome, many unusually long ungapped regions are detected that are unaccounted for by the neutral model, and which we predict to be highly enriched in functional DNA that has been subject to purifying selection with respect to indels. We use the model to determine the proportion under indel-purifying selection to be between 2.56% and 3.25% of human euchromatin. Since annotated protein-coding genes comprise only 1.2% of euchromatin, these results lend further weight to the proposition that more than half the functional complement of the human genome is non-protein-coding. The method is surprisingly powerful at identifying selected sequence using only two or three mammalian genomes. Applying the method to the human, mouse, and dog genomes, we identify 90 Mb of human sequence under indel-purifying selection, at a predicted 10% false-discovery rate and 75% sensitivity. As expected, most of the identified sequence represents unannotated material, while the recovered proportions of known protein-coding and microRNA genes closely match the predicted sensitivity of the method. The method's high sensitivity to functional sequence such as microRNAs suggest that as yet unannotated microRNA genes are enriched among the sequences identified. Futhermore, its independence of substitutions allowed us to identify sequence that has been subject to heterogeneous selection, that is, sequence subject to both positive selection with respect to substitutions and purifying selection with respect to indels. The ability to identify elements under heterogeneous selection enables, for the first time, the genome-wide investigation of positive selection on functional elements other than protein-coding genes.

Highlights

  • The human genome has been shaped by the evolutionary forces of mutation, genetic drift, and selection, with the latter acting, in the main, to purify functional regions of deleterious mutations

  • We show that deviations from the model are, in the main, not caused by variations in the neutral indel rates, but are consistent with selection acting to purify the genome of deleterious indels that arise in functional regions

  • Restricting ourselves to ancestral repeats (ARs), transposable elements (TEs) inserted before the human–mouse split, we found a near-exact fit between observations and the neutral model predictions

Read more

Summary

Introduction

The human genome has been shaped by the evolutionary forces of mutation, genetic drift, and selection, with the latter acting, in the main, to purify functional regions of deleterious mutations. By comparing the human and mouse genomes, previously it was estimated that about 5% of the human genome has undergone fewer point mutations than expected under a neutral substitution model [1,2] Accepting that this is most likely caused by the effects of purifying selection acting on deleterious mutations, the observation implies that at least 5% of the human genome is biologically functional. Comparative methods for closely related species typically analyze substitution patterns to flag conserved regions [7] These methods are well-developed, and they exploit phylogenetic information and correlations along the sequence to achieve high sensitivities. These methods can be hard to calibrate because of incompletely understood variations in neutral rates of substitution due to, for instance, methylation levels, chromatin state, transcriptional activity, Editor: Steven Henikoff, Fred Hutchinson Cancer Research Center, United States of America

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.