Abstract
BackgroundStructured RNAs have many biological functions ranging from catalysis of chemical reactions to gene regulation. Yet, many homologous structured RNAs display most of their conservation at the secondary or tertiary structure level. As a result, strategies for structured RNA discovery rely heavily on identification of sequences sharing a common stable secondary structure. However, correctly distinguishing structured RNAs from surrounding genomic sequence remains challenging, especially during de novo discovery. RNA also has a long history as a computational model for evolution due to the direct link between genotype (sequence) and phenotype (structure). From these studies it is clear that evolved RNA structures, like protein structures, can be considered robust to point mutations. In this context, an RNA sequence is considered robust if its neutrality (extent to which single mutant neighbors maintain the same secondary structure) is greater than that expected for an artificial sequence with the same minimum free energy structure.ResultsIn this work, we bring concepts from evolutionary biology to bear on the structured RNA de novo discovery process. We hypothesize that alignments corresponding to structured RNAs should consist of neutral sequences. We evaluate several measures of neutrality for their ability to distinguish between alignments of structured RNA sequences drawn from Rfam and various decoy alignments. We also introduce a new measure of RNA structural neutrality, the structure ensemble neutrality (SEN). SEN seeks to increase the biological relevance of existing neutrality measures in two ways. First, it uses information from an alignment of homologous sequences to identify a conserved biologically relevant structure for comparison. Second, it only counts base-pairs of the original structure that are absent in the comparison structure and does not penalize the formation of additional base-pairs.ConclusionWe find that several measures of neutrality are effective at separating structured RNAs from decoy sequences, including both shuffled alignments and flanking genomic sequence. Furthermore, as an independent feature classifier to identify structured RNAs, SEN yields comparable performance to current approaches that consider a variety of features including stability and sequence identity. Finally, SEN outperforms other measures of neutrality at detecting mutational robustness in bacterial regulatory RNA structures.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-014-1203-8) contains supplementary material, which is available to authorized users.
Highlights
Structured RNAs have many biological functions ranging from catalysis of chemical reactions to gene regulation
Reference structure and distance metric impact calculated neutrality A set of structured RNA alignments derived from RNA Families database (Rfam) seed alignments (Dataset2, Table 1, Additional file 1: Table S1) was used to validate structural ensemble neutrality (SEN) as a measure of neutrality by comparing its performance to other measures that are the basis of most programs designed to capture RNA structural robustness: bp-distance and Pearson’s correlation coefficient (PCC)
Using our modified version of bp-distance that imports the structure from the alignment does incrementally improve separation of structured RNAs and negative data (0.7654 vs. 0.6293, 0.7229, 0.6692, 0.6692, 0.6618) compared to RNAmute (Figure 1C) demonstrating that using the consensus structure from the alignment improves the accuracy of the structure
Summary
Structured RNAs have many biological functions ranging from catalysis of chemical reactions to gene regulation. RNA has a long history as a computational model for evolution due to the direct link between genotype (sequence) and phenotype (structure) From these studies it is clear that evolved RNA structures, like protein structures, can be considered robust to point mutations. The biological function of structured RNAs often depends on a well-defined three-dimensional shape that is largely determined by interactions between discrete and stable secondary structure elements [6,7,8]. These structural constraints lead to covarying mutations, a conservation pattern characterized by the maintenance of basepairing interactions involved in RNA secondary structure [9,10].
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have