Faster algorithms for RNA-folding using the Four-Russians method.

Balaji Venkatachalam,Dan Gusfield,Yelena Frid

doi:10.1186/1748-7188-9-5

Balaji Venkatachalam, Dan Gusfield + Show 1 more

Open Access

https://doi.org/10.1186/1748-7188-9-5

Copy DOI

Abstract

BackgroundThe secondary structure that maximizes the number of non-crossing matchings between complimentary bases of an RNA sequence of length n can be computed in O(n3) time using Nussinov’s dynamic programming algorithm. The Four-Russians method is a technique that reduces the running time for certain dynamic programming algorithms by a multiplicative factor after a preprocessing step where solutions to all smaller subproblems of a fixed size are exhaustively enumerated and solved. Frid and Gusfield designed an algorithm for RNA folding using the Four-Russians technique. In their algorithm the preprocessing is interleaved with the algorithm computation.Theoretical resultsWe simplify the algorithm and the analysis by doing the preprocessing once prior to the algorithm computation. We call this the two-vector method. We also show variants where instead of exhaustive preprocessing, we only solve the subproblems encountered in the main algorithm once and memoize the results. We give a simple proof of correctness and explore the practical advantages over the earlier method.The Nussinov algorithm admits an O(n2) time parallel algorithm. We show a parallel algorithm using the two-vector idea that improves the time bound to .Practical resultsWe have implemented the parallel algorithm on graphics processing units using the CUDA platform. We discuss the organization of the data structures to exploit coalesced memory access for fast running times. The ideas to organize the data structures also help in improving the running time of the serial algorithms. For sequences of length up to 6000 bases the parallel algorithm takes only about 2.5 seconds and the two-vector serial method takes about 57 seconds on a desktop and 15 seconds on a server. Among the serial algorithms, the two-vector and memoized versions are faster than the Frid-Gusfield algorithm by a factor of 3, and are faster than Nussinov by up to a factor of 20. The source-code for the algorithms is available at http://github.com/ijalabv/FourRussiansRNAFolding.

Highlights

The secondary structure that maximizes the number of non-crossing matchings between complimentary bases of an RNA sequence of length n can be computed in O(n3) time using Nussinov’s dynamic programming algorithm
We show a parallel algorithm using the two-vector idea that improves the time bound to n2 log n
The Four-Russians method, named after Aralazarov et al [3], is a method to speed up certain dynamic programming algorithms

Summary

Background

Computational approaches to find the secondary structure of RNA molecules are used extensively in bioinformatics applications. After computing the q-contiguous cells of a group in a row, the value in the initial cell D(i, p) and the horizontal difference vector vp are known They run the preprocessing algorithm in page 3 for this fixed vp vector together with all possible vertical difference vectors. In FG the preprocessing step is run once for each group of each row, even if the vector pair was seen earlier This is because the table contains the result of addition of the initial cell of the group D(i, p). Since the preprocessing is done for every group of every row, the same horizontal vector can be seen multiple times in the table This leads to duplicated work and slower running time than the two-vector algorithm.

7: Synchronize with other processes

Conclusions

Akutsu T