Abstract

Recent work has suggested that there are many more selectively constrained, functional noncoding than coding sites in mammalian genomes. However, little is known about how selective constraint varies amongst different classes of noncoding DNA. We estimated the magnitude of selective constraint on a large dataset of mouse-rat gene orthologs and their surrounding noncoding DNA. Our analysis indicates that there are more than three times as many selectively constrained, nonrepetitive sites within noncoding DNA as in coding DNA in murids. The majority of these constrained noncoding sites appear to be located within intergenic regions, at distances greater than 5 kilobases from known genes. Our study also shows that in murids, intron length and mean intronic selective constraint are negatively correlated with intron ordinal number. Our results therefore suggest that functional intronic sites tend to accumulate toward the 5′ end of murid genes. Our analysis also reveals that mean number of selectively constrained noncoding sites varies substantially with the function of the adjacent gene. We find that, among others, developmental and neuronal genes are associated with the greatest numbers of putatively functional noncoding sites compared with genes involved in electron transport and a variety of metabolic processes. Combining our estimates of the total number of constrained coding and noncoding bases we calculate that over twice as many deleterious mutations have occurred in intergenic regions as in known genic sequence and that the total genomic deleterious point mutation rate is 0.91 per diploid genome, per generation. This estimated rate is over twice as large as a previous estimate in murids.

Highlights

  • Protein-coding genes typically comprise a rather small part of many mammalian genomes [1,2,3,4]

  • This study finds that genes involved in development and the nervous system are typically associated with much higher quantities of functional noncoding DNA, suggesting that these genes require more finely tuned control of their expression

  • The alignments provided a total of 62.50 Mb of ancestral repeat sequence, of which 20.14 Mb was located within introns and the remaining 42.36 Mb located within intergenic regions

Read more

Summary

Introduction

Protein-coding genes typically comprise a rather small part of many mammalian genomes [1,2,3,4]. Subsequent work has indicated that, whilst a number of conserved regions in noncoding DNA sequences may be undiscovered protein-coding genes, or partially overlap with existing genes [9], the evidence does not support a proteincoding function for such conserved regions in many cases [6,11]. It should be noted, that sequence conservation per se does not necessarily imply functionality and may reflect variation in the mutation rate [15]

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.