Abstract

Finding similarities between protein structures is a crucial task in molecular biology. Most of the existing tools require proteins to be aligned in order-preserving way and only find single alignments even when multiple similar regions exist. We propose a new seed-based approach that discovers multiple pairs of similar regions. Its computational complexity is polynomial and it comes with a quality guarantee—the returned alignments have both root mean squared deviations (coordinate-based as well as internal-distances based) lower than a given threshold, if such exist. We do not require the alignments to be order preserving (i.e., we consider nonsequential alignments), which makes our algorithm suitable for detecting similar domains when comparing multidomain proteins as well as to detect structural repetitions within a single protein. Because the search space for nonsequential alignments is much larger than for sequential ones, the computational burden is addressed by extensive use of parallel computing techniques: a coarse-grain level parallelism making use of available CPU cores for computation and a fine-grain level parallelism exploiting bit-level concurrency as well as vector instructions.

Highlights

  • A protein’s three-dimensional structure tends to be evolutionarily better preserved than its sequence

  • The structural alignment problem is to find the mapping that is optimal with respect to the scoring function

  • The complexity of the protein structural alignment problem and the quality of the found solution strongly depend on the way that scoring function is defined

Read more

Summary

Introduction

A protein’s three-dimensional structure tends to be evolutionarily better preserved than its sequence. Scientific Programming as an NP-hard problem, for example, the protein threading problem [9], the problem of enumerating all maximal cliques [10, 11], or finding a maximum clique [12,13,14] These results have been generalized by Kolodny and Linial [1], who showed that protein structural alignment is NP-hard if the similarity score is distance based. None of them is close to the approach proposed here As they are all heuristic and do not guarantee finding an optimal alignment, a detailed comparison with algorithms based on different concepts requires extensive numerical experiments and is outside the scope of this study. Additional sections are added including a comparison between the straightforward and the bit-vector implementations based on complexity analysis as well as detailed analysis of the work from the point of view of future performance improvements and additional possible applications

Preliminaries
Methods
Parallelism
Experimental Evaluation
Conclusion and Perspectives
Disclosure
Findings
Diversifying the Applications
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call