Abstract

Evolutionarily conserved RNA secondary structures are a robust indicator of purifying selection and, consequently, molecular function. Evaluating their genome-wide occurrence through comparative genomics has consistently been plagued by high false-positive rates and divergent predictions. We present a novel benchmarking pipeline aimed at calibrating the precision of genome-wide scans for consensus RNA structure prediction. The benchmarking data obtained from two refined structure prediction algorithms, RNAz and SISSIz, were then analyzed to fine-tune the parameters of an optimized workflow for genomic sliding window screens. When applied to consistency-based multiple genome alignments of 35 mammals, our approach confidently identifies >4 million evolutionarily constrained RNA structures using a conservative sensitivity threshold that entails historically low false discovery rates for such analyses (5–22%). These predictions comprise 13.6% of the human genome, 88% of which fall outside any known sequence-constrained element, suggesting that a large proportion of the mammalian genome is functional. As an example, our findings identify both known and novel conserved RNA structure motifs in the long noncoding RNA MALAT1. This study provides an extensive set of functional transcriptomic annotations that will assist researchers in uncovering the precise mechanisms underlying the developmental ontologies of higher eukaryotes.

Highlights

  • The majority of the human genome is dynamically transcribed into RNA, most of which does not code for proteins [1,2,3,4]

  • The combined data from these reports encompass 9.2% of the human genome, whereas the majority (87.8%) of the Evolutionarily Conserved Structure (ECS) predictions reported lie outside annotated sequence-constrained elements (Figure 4D). We investigated whether this dichotomy was a consequence of ECS predictions derived from primate-specific lineages, which display higher than average sequence homology compared to deeper alignments

  • The findings presented provide novel evidence for widespread functionality acting through RNA secondary structure, under the premise that negative evolutionary selection is a bona fide indicator of molecular function, in conjunction with the fact that the majority of the human genome is transcribed

Read more

Summary

Introduction

The majority of the human genome is dynamically transcribed into RNA, most of which does not code for proteins [1,2,3,4]. The once common presumption that most non–protein-coding sequences are nonfunctional for the organism is being adjusted to the increasing evidence that noncoding RNAs (ncRNAs) represent a previously unappreciated layer of gene expression essential for the epigenetic regulation of differentiation and development [5,6,7,8].

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.