Abstract
Plant genomes contain a large fraction of noncoding sequences. The discovery and annotation of conserved noncoding sequences (CNSs) in plants is an ongoing challenge. Here we report the application of comparative genomics to systematically identify CNSs in 50 well-annotated Gramineae genomes using rice (Oryza sativa) as the reference. We conduct multiple-way whole-genome alignments to the rice genome. The rice genome is annotated as 20 conservation states (CSs) at single-nucleotide resolution using a multivariate hidden Markov model (ConsHMM) based on the multiple-genome alignments. Different states show distinct enrichments for various genomic features, and the conservation scores of CSs are highly correlated with the level of associated chromatin accessibility. We find that at least 33.5% of the rice genome is highly under selection, with more than 70% of the sequence lying outside of coding regions. A catalog of 855,366 regulatory CNSs is generated, and they significantly overlapped with putative active regulatory elements such as promoters, enhancers, and transcription factor binding sites. Collectively, our study provides a resource for elucidating functional noncoding regions of the rice genome and an evolutionary aspect of regulatory sequences in higher plants.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.