Parallel Seed-Based Approach to Multiple Protein Structure Similarities Detection

Guillaume Chapuis,Rumen Andonov,Mathilde Le Boudic-Jamin,Dominique Lavenier,Hristo Djidjev

doi:10.1155/2015/279715

Abstract

Finding similarities between protein structures is a crucial task in molecular biology. Most of the existing tools require proteins to be aligned in order-preserving way and only find single alignments even when multiple similar regions exist. We propose a new seed-based approach that discovers multiple pairs of similar regions. Its computational complexity is polynomial and it comes with a quality guarantee—the returned alignments have both root mean squared deviations (coordinate-based as well as internal-distances based) lower than a given threshold, if such exist. We do not require the alignments to be order preserving (i.e., we consider nonsequential alignments), which makes our algorithm suitable for detecting similar domains when comparing multidomain proteins as well as to detect structural repetitions within a single protein. Because the search space for nonsequential alignments is much larger than for sequential ones, the computational burden is addressed by extensive use of parallel computing techniques: a coarse-grain level parallelism making use of available CPU cores for computation and a fine-grain level parallelism exploiting bit-level concurrency as well as vector instructions.

Highlights

A protein’s three-dimensional structure tends to be evolutionarily better preserved than its sequence
The structural alignment problem is to find the mapping that is optimal with respect to the scoring function
The complexity of the protein structural alignment problem and the quality of the found solution strongly depend on the way that scoring function is defined

Summary

Introduction

A protein’s three-dimensional structure tends to be evolutionarily better preserved than its sequence. Scientific Programming as an NP-hard problem, for example, the protein threading problem [9], the problem of enumerating all maximal cliques [10, 11], or finding a maximum clique [12,13,14] These results have been generalized by Kolodny and Linial [1], who showed that protein structural alignment is NP-hard if the similarity score is distance based. None of them is close to the approach proposed here As they are all heuristic and do not guarantee finding an optimal alignment, a detailed comparison with algorithms based on different concepts requires extensive numerical experiments and is outside the scope of this study. Additional sections are added including a comparison between the straightforward and the bit-vector implementations based on complexity analysis as well as detailed analysis of the work from the point of view of future performance improvements and additional possible applications

Preliminaries

Methods

Parallelism

Experimental Evaluation

Conclusion and Perspectives

Disclosure

Findings

Diversifying the Applications

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific programming	Publication Date: Jan 1, 2015
Citations: 26	License type: CC BY 3.0

R Discovery Prime

R Discovery Prime

Parallel Seed-Based Approach to Multiple Protein Structure Similarities Detection

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific programming

Lead the way for us

Similar Papers

Non-sequential alignments in protein structure comparison
Alexej Abyzov
-
Alexej AbyzovAlexej Abyzov
10 May 2021
10 May 2021

Parallel Seed-Based Approach to Protein Structure Similarity Detection
Guillaume Chapuis ... Rumen Andonov
-
Guillaume Chapuis, et. al.Guillaume Chapuis ... Rumen Andonov
01 Jan 2014
01 Jan 2014

A comprehensive analysis of non-sequential alignments between all protein structures
Alexej Abyzov ... Valentin A Ilyin
BMC structural biology | VOL. 7
Alexej Abyzov, et. al.Alexej Abyzov ... Valentin A Ilyin
01 Jan 2007
BMC structural biology | VOL. 7

MICAN : a protein structure alignment algorithm that can handle Multiple-chains, Inverse alignments, C α only models, Alternative alignments, and Non-sequential alignments
Shintaro Minami ... Kengo Sawada
BMC bioinformatics | VOL. 14
Shintaro Minami, et. al.Shintaro Minami ... Kengo Sawada
18 Jan 2013
BMC bioinformatics | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Parallel Seed-Based Approach to Multiple Protein Structure Similarities Detection

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific programming