Abstract

BackgroundRNA secondary structure prediction methods based on probabilistic modeling can be developed using stochastic context-free grammars (SCFGs). Such methods can readily combine different sources of information that can be expressed probabilistically, such as an evolutionary model of comparative RNA sequence analysis and a biophysical model of structure plausibility. However, the number of free parameters in an integrated model for consensus RNA structure prediction can become untenable if the underlying SCFG design is too complex. Thus a key question is, what small, simple SCFG designs perform best for RNA secondary structure prediction?ResultsNine different small SCFGs were implemented to explore the tradeoffs between model complexity and prediction accuracy. Each model was tested for single sequence structure prediction accuracy on a benchmark set of RNA secondary structures.ConclusionsFour SCFG designs had prediction accuracies near the performance of current energy minimization programs. One of these designs, introduced by Knudsen and Hein in their PFOLD algorithm, has only 21 free parameters and is significantly simpler than the others.

Highlights

  • Introduction to Theory of Computation BrooksCole Pub Co; 1996.40

  • Most RNA secondary structure prediction algorithms are based on energy minimization [2,3,4,5,6,7]

  • Our goal is to identify lightweight stochastic context-free grammars (SCFGs) model designs that can serve as cores underlying more complex integrated approaches

Read more

Summary

Introduction

Introduction to Theory of Computation BrooksCole Pub Co; 1996.40. Hopcroft JE, Ullman JD: Introduction to Automata Theory, Languages, and Computation Addison-Wesley; 1979. 41. RNA secondary structure prediction methods based on probabilistic modeling can be developed using stochastic context-free grammars (SCFGs). Such methods can readily combine different sources of information that can be expressed probabilistically, such as an evolutionary model of comparative RNA sequence analysis and a biophysical model of structure plausibility. The number of free parameters in an integrated model for consensus RNA structure prediction can become untenable if the underlying SCFG design is too complex. Probabilistic modeling approaches using stochastic context-free grammars (SCFGs) can be used [8,9,10]. An outstanding problem is consensus RNA secondary structure prediction for a small number of structurally homologous RNA sequences.

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call