Abstract

BackgroundAccurate structure prediction methods play an important role for the understanding of RNA function. Energy-based, pseudoknot-free secondary structure prediction is one of the most widely used and versatile approaches, and improved methods for this task have received much attention over the past five years. Despite the impressive progress that as been achieved in this area, existing evaluations of the prediction accuracy achieved by various algorithms do not provide a comprehensive, statistically sound assessment. Furthermore, while there is increasing evidence that no prediction algorithm consistently outperforms all others, no work has been done to exploit the complementary strengths of multiple approaches.ResultsIn this work, we present two contributions to the area of RNA secondary structure prediction. Firstly, we use state-of-the-art, resampling-based statistical methods together with a previously published and increasingly widely used dataset of high-quality RNA structures to conduct a comprehensive evaluation of existing RNA secondary structure prediction procedures. The results from this evaluation clarify the performance relationship between ten well-known existing energy-based pseudoknot-free RNA secondary structure prediction methods and clearly demonstrate the progress that has been achieved in recent years. Secondly, we introduce AveRNA, a generic and powerful method for combining a set of existing secondary structure prediction procedures into an ensemble-based method that achieves significantly higher prediction accuracies than obtained from any of its component procedures.ConclusionsOur new, ensemble-based method, AveRNA, improves the state of the art for energy-based, pseudoknot-free RNA secondary structure prediction by exploiting the complementary strengths of multiple existing prediction procedures, as demonstrated using a state-of-the-art statistical resampling approach. In addition, AveRNA allows an intuitive and effective control of the trade-off between false negative and false positive base pair predictions. Finally, AveRNA can make use of arbitrary sets of secondary structure prediction procedures and can therefore be used to leverage improvements in prediction accuracy offered by algorithms and energy models developed in the future. Our data, MATLAB software and a web-based version of AveRNA are publicly available at http://www.cs.ubc.ca/labs/beta/Software/AveRNA.

Highlights

  • Accurate structure prediction methods play an important role for the understanding of RNA function

  • CONTRAfold 1.1 performs slightly better than T99 on our benchmark set, but the performance advantage is not statistically significant; we believe that the reason for this lies primarily in the fact that it was trained on a small set of RNAs not representative of the broad range of structures found in S-STRAND2

  • MaxExpect and Centroidfold do perform significantly better than T99, but fall short of the performance achieved by CONTRAfold 2.0

Read more

Summary

Introduction

Accurate structure prediction methods play an important role for the understanding of RNA function. Energy-based, pseudoknot-free secondary structure prediction is one of the most widely used and versatile approaches, and improved methods for this task have received much attention over the past five years. The largest improvement in prediction accuracy has resulted from the use of sophisticated methods for estimating the thermodynamic parameters of a given energy model (in particular, the Turner model), based on a set of reliable RNA secondary structures [3,4,5]. Improved prediction accuracy has been reported for an approach that determines structures with maximum expected accuracy (MEA) rather than minimum free energy, based on base pairing probabilities obtained from a partition-function calculation [3,6,7]. CentroidFold uses a similar strategy except that it uses a weighted some of true positives and true negatives as the objective function [7]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call