Abstract

Multiple sequence alignment is a crucial task in a number of biological analyses like secondary structure prediction, domain searching, phylogeny, etc. MSAProbs is currently the most accurate alignment algorithm, but its effectiveness is obtained at the expense of computational time. In the paper we present QuickProbs, the variant of MSAProbs customised for graphics processors. We selected the two most time consuming stages of MSAProbs to be redesigned for GPU execution: the posterior matrices calculation and the consistency transformation. Experiments on three popular benchmarks (BAliBASE, PREFAB, OXBench-X) on quad-core PC equipped with high-end graphics card show QuickProbs to be 5.7 to 9.7 times faster than original CPU-parallel MSAProbs. Additional tests performed on several protein families from Pfam database give overall speed-up of 6.7. Compared to other algorithms like MAFFT, MUSCLE, or ClustalW, QuickProbs proved to be much more accurate at similar speed. Additionally we introduce a tuned variant of QuickProbs which is significantly more accurate on sets of distantly related sequences than MSAProbs without exceeding its computation time. The GPU part of QuickProbs was implemented in OpenCL, thus the package is suitable for graphics processors produced by all major vendors.

Highlights

  • Multiple sequence alignment (MSA) is an essential task in molecular biology

  • We selected MSAProbs, the most accurate from existing MSA methods as our starting point and developed QuickProbs. It is a variant of MSAProbs algorithm suited for graphics processors preserving outstanding accuracy of its predecessor

  • In the paper we present QuickProbs, a variant of MSAProbs algorithm suited for graphics processors

Read more

Summary

Introduction

Multiple sequence alignment (MSA) is an essential task in molecular biology. It is performed for both, nucleotide and protein sequences. Its field of applications covers phylogenetic analyses, gene finding, identification of functional domains, prediction of secondary structures, and many others. Increasing size of sequence databases allowed by the development of high throughput sequencing technologies provides biologists with the opportunity to analyse in silico enormous sets of data. The constant pressure for developing more accurate and faster MSA algorithms. One of the most popular multiple sequence alignment software is ClustalW [6]. It is a classic representative of progressive algorithms, and works according to the scheme: 1. It is a classic representative of progressive algorithms, and works according to the scheme: 1. Estimate evolutionary distances between all pairs of sequences

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.