Abstract

Ribonucleic acid (RNA) secondary structure prediction continues to be a significant challenge, in particular when attempting to model sequences with less rigidly defined structures, such as messenger and non-coding RNAs. Crucial to interpreting RNA structures as they pertain to individual phenotypes is the ability to detect RNAs with large structural disparities caused by a single nucleotide variant (SNV) or riboSNitches. A recently published human genome-wide parallel analysis of RNA structure (PARS) study identified a large number of riboSNitches as well as non-riboSNitches, providing an unprecedented set of RNA sequences against which to benchmark structure prediction algorithms. Here we evaluate 11 different RNA folding algorithms’ riboSNitch prediction performance on these data. We find that recent algorithms designed specifically to predict the effects of SNVs on RNA structure, in particular remuRNA, RNAsnp and SNPfold, perform best on the most rigorously validated subsets of the benchmark data. In addition, our benchmark indicates that general structure prediction algorithms (e.g. RNAfold and RNAstructure) have overall better performance if base pairing probabilities are considered rather than minimum free energy calculations. Although overall aggregate algorithmic performance on the full set of riboSNitches is relatively low, significant improvement is possible if the highest confidence predictions are evaluated independently.

Highlights

  • Accurate Ribonucleic acid (RNA) structure prediction remains a contemporary challenge in the field of bioinformatics [1,2,3]

  • Many messenger RNAs and non-coding RNAs are not evolved to adopt rigidly defined structures, in general adopting an ensemble of diverse conformations

  • parallel analysis of RNA structure (PARS) measures the differential reactivity of each nucleotide in a folded RNA to the V1 and S1 RNases which selectively cleave double- and single-stranded regions, respectively [51]

Read more

Summary

Introduction

Accurate RNA structure prediction remains a contemporary challenge in the field of bioinformatics [1,2,3]. Accurate prediction of the accessibility of specific sequence motifs in transcripts plays a decisive role in understanding post-transcriptional regulation, as transcript secondary structure can impact the binding of RNA binding proteins, ribosomes and miRNAs [16,17,18,19,20,21] Given that these RNAs adopt a wide range of structures, traditional structural benchmarking is complicated by the fact that experimental techniques to determine an ensemble of structures do not exist for large RNAs. An alternative strategy is to benchmark folding algorithms’ performance in predicting the perturbation on the structural ensemble by particular mutations [22]. A comprehensive and consistent RNA structure data set on a large number of mutations in mRNA transcripts was not available until very recently [23]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call