Abstract

Minimum free energy prediction of RNA secondary structures is based on the Nearest Neighbor Thermodynamics Model. While such predictions are typically good, the accuracy can vary widely even for short sequences, and the branching thermodynamics are an important factor in this variance. Recently, the simplest model for multiloop energetics—a linear function of the number of branches and unpaired nucleotides—was found to be the best. Subsequently, a parametric analysis demonstrated that per family accuracy can be improved by changing the weightings in this linear function. However, the extent of improvement was not known due to the ad hoc method used to find the new parameters. Here we develop a branch-and-bound algorithm that finds the set of optimal parameters with the highest average accuracy for a given set of sequences. Our analysis shows that the previous ad hoc parameters are nearly optimal for tRNA and 5S rRNA sequences on both training and testing sets. Moreover, cross-family improvement is possible but more difficult because competing parameter regions favor different families. The results also indicate that restricting the unpaired nucleotide penalty to small values is warranted. This reduction makes analyzing longer sequences using the present techniques more feasible.

Highlights

  • Accurate prediction of RNA base pairings from sequence remains a fundamental problem in bioinformatics

  • We first address the extent of improvement possible when the branching parameters are trained on a specific family, either transfer RNA (tRNA) or 5S ribosomal RNA (rRNA)

  • Previous results [17] demonstrated that it was possible to achieve a statistically significant improvement in minimum free energy (MFE) prediction accuracy by altering the three Nearest Neighbor Thermodynamic Model (NNTM) parameters which govern the entropic cost of loop branching

Read more

Summary

Introduction

Accurate prediction of RNA base pairings from sequence remains a fundamental problem in bioinformatics. These three parameters govern the entropic cost of branching, which is a critical aspect of the overall molecular configuration They are some of the few not based on experimental data, and so are reasonable candidates for such a targeted reevaluation. We consider two families of RNA molecules: transfer RNA (tRNA) and 5S ribosomal RNA (rRNA) Their sequence lengths are amenable to our current methods, while providing two different branching configurations to analyze. This enables us to confirm the extent of accuracy improvement possible on a per family basis, while illustrating the challenges of obtaining such improvements simultaneously over two (or more) families

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call