PicXAA-R: Efficient structural alignment of multiple RNA sequences using a greedy approach

Sayed Mohammad Ebrahim Sahraeian,Byung-Jun Yoon

doi:10.1186/1471-2105-12-s1-s38

Abstract

BackgroundAccurate and efficient structural alignment of non-coding RNAs (ncRNAs) has grasped more and more attentions as recent studies unveiled the significance of ncRNAs in living organisms. While the Sankoff style structural alignment algorithms cannot efficiently serve for multiple sequences, mostly progressive schemes are used to reduce the complexity. However, this idea tends to propagate the early stage errors throughout the entire process, thereby degrading the quality of the final alignment. For multiple protein sequence alignment, we have recently proposed PicXAA which constructs an accurate alignment in a non-progressive fashion.ResultsHere, we propose PicXAA-R as an extension to PicXAA for greedy structural alignment of ncRNAs. PicXAA-R efficiently grasps both folding information within each sequence and local similarities between sequences. It uses a set of probabilistic consistency transformations to improve the posterior base-pairing and base alignment probabilities using the information of all sequences in the alignment. Using a graph-based scheme, we greedily build up the structural alignment from sequence regions with high base-pairing and base alignment probabilities.ConclusionsSeveral experiments on datasets with different characteristics confirm that PicXAA-R is one of the fastest algorithms for structural alignment of multiple RNAs and it consistently yields accurate alignment results, especially for datasets with locally similar sequences. PicXAA-R source code is freely available at: http://www.ece.tamu.edu/~bjyoon/picxaa/.

Highlights

Accurate and efficient structural alignment of non-coding RNAs has grasped more and more attentions as recent studies unveiled the significance of ncRNAs in living organisms
Wang et al.[44] designed two types of datasets to verify the potential of RNA sequence aligners in dealing with local similarities in the alignment set: (1) BraliSub, the subsets of BraliBase 2.1 with high variability; (2) LocalExtR, an extension of BraliBase 2.1 consisting total of 90 large-scale reference alignments categorized into k20, k40, k60, and k80 reference sets receptively with 20, 40, 60, and 80 sequences in each alignment
For Matthews correlation coefficient (MCC) score which compromises between sensitivity and specificity PicXAA-R outperforms CentroidAlign by 0.8%

Summary

Introduction

Accurate and efficient structural alignment of non-coding RNAs (ncRNAs) has grasped more and more attentions as recent studies unveiled the significance of ncRNAs in living organisms. While the Sankoff style structural alignment algorithms cannot efficiently serve for multiple sequences, mostly progressive schemes are used to reduce the complexity. Murlet [12], RAF [13], PARTS [14], STRAL [15], LocARNA [16], CentroidAlign [17], and PMcomp [18] exploit probabilistic approaches by implementing base-pairing probabilities in a restricted Sankoffstyle framework or employing the Needleman-Wunsch algorithm with structural scores. These variants of Sankoff’s algorithm significantly reduce the time and memory complexities, they still cannot directly find the structural alignment of multiple sequences. These algorithms build up the multiple sequence alignment (MSA) by progressively combining pairwise structural alignments along a guide tree

Methods

Results

Conclusion