TurboFold: Iterative probabilistic estimation of secondary structures for multiple RNA sequences

Arif O Harmanci,Gaurav Sharma,David H Mathews

doi:10.1186/1471-2105-12-108

Abstract

BackgroundThe prediction of secondary structure, i.e. the set of canonical base pairs between nucleotides, is a first step in developing an understanding of the function of an RNA sequence. The most accurate computational methods predict conserved structures for a set of homologous RNA sequences. These methods usually suffer from high computational complexity. In this paper, TurboFold, a novel and efficient method for secondary structure prediction for multiple RNA sequences, is presented.ResultsTurboFold takes, as input, a set of homologous RNA sequences and outputs estimates of the base pairing probabilities for each sequence. The base pairing probabilities for a sequence are estimated by combining intrinsic information, derived from the sequence itself via the nearest neighbor thermodynamic model, with extrinsic information, derived from the other sequences in the input set. For a given sequence, the extrinsic information is computed by using pairwise-sequence-alignment-based probabilities for co-incidence with each of the other sequences, along with estimated base pairing probabilities, from the previous iteration, for the other sequences. The extrinsic information is introduced as free energy modifications for base pairing in a partition function computation based on the nearest neighbor thermodynamic model. This process yields updated estimates of base pairing probability. The updated base pairing probabilities in turn are used to recompute extrinsic information, resulting in the overall iterative estimation procedure that defines TurboFold.TurboFold is benchmarked on a number of ncRNA datasets and compared against alternative secondary structure prediction methods. The iterative procedure in TurboFold is shown to improve estimates of base pairing probability with each iteration, though only small gains are obtained beyond three iterations. Secondary structures composed of base pairs with estimated probabilities higher than a significance threshold are shown to be more accurate for TurboFold than for alternative methods that estimate base pairing probabilities. TurboFold-MEA, which uses base pairing probabilities from TurboFold in a maximum expected accuracy algorithm for secondary structure prediction, has accuracy comparable to the best performing secondary structure prediction methods. The computational and memory requirements for TurboFold are modest and, in terms of sequence length and number of sequences, scale much more favorably than joint alignment and folding algorithms.ConclusionsTurboFold is an iterative probabilistic method for predicting secondary structures for multiple RNA sequences that efficiently and accurately combines the information from the comparative analysis between sequences with the thermodynamic folding model. Unlike most other multi-sequence structure prediction methods, TurboFold does not enforce strict commonality of structures and is therefore useful for predicting structures for homologous sequences that have diverged significantly. TurboFold can be downloaded as part of the RNAstructure package at http://rna.urmc.rochester.edu.

Highlights

The prediction of secondary structure, i.e. the set of canonical base pairs between nucleotides, is a first step in developing an understanding of the function of an RNA sequence
Accurate prediction of RNA secondary structure improves computational methods that scan genomes for novel ncRNA genes [4,10,11,12,13,14] because these methods utilize structure prediction to test for conserved secondary structure across genomes, which, in turn suggests that the sequence regions corresponding to conserved structural regions form homologous ncRNA genes
Comparative sequence analysis takes as input multiple homologous RNA sequences and predicts a consensus secondary structure

Summary

Introduction

The prediction of secondary structure, i.e. the set of canonical base pairs between nucleotides, is a first step in developing an understanding of the function of an RNA sequence. The most accurate computational methods predict conserved structures for a set of homologous RNA sequences. These methods usually suffer from high computational complexity. Comparative sequence analysis methods [21] that utilize a large number of homologs for RNA folding, currently offer the most accurate prediction of secondary structure. Computational methods for structure prediction using multiple homologous sequences can be thought of as attempts to automate comparative sequence analysis, typically with a much smaller number of input sequences. A recent comprehensive review of computational methods for structure prediction for multiple sequences can be found in [22]

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Apr 20, 2011
Citations: 85	License type: cc-by

R Discovery Prime

R Discovery Prime

TurboFold: Iterative probabilistic estimation of secondary structures for multiple RNA sequences

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

The influence of dataset homology and a rigorous evaluation strategy on protein secondary structure prediction.
Teng-Ruei Chen ... Wei-Cheng Lo
PloS one | VOL. 16
Teng-Ruei Chen, et. al.Teng-Ruei Chen ... Wei-Cheng Lo
14 Jul 2021
PloS one | VOL. 16

The influence of dataset homology and a rigorous evaluation strategy on protein secondary structure prediction
Alexandre G De Brevern ... Wei-Cheng Lo
-
Alexandre G De Brevern, et. al.Alexandre G De Brevern ... Wei-Cheng Lo
14 Jul 2021
14 Jul 2021

Modeling RNA Secondary Structure with Sequence Comparison and Experimental Mapping Data
Zhen Tan ... David H Mathews
Biophysical Journal | VOL. 113
Zhen Tan, et. al.Zhen Tan ... David H Mathews
01 Jul 2017
Biophysical Journal | VOL. 113

Prediction of RNA secondary structures: from theory to models and real molecules
Peter Schuster
Reports on Progress in Physics | VOL. 69
Peter SchusterPeter Schuster
18 Apr 2006
Reports on Progress in Physics | VOL. 69

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

TurboFold: Iterative probabilistic estimation of secondary structures for multiple RNA sequences

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics