Abstract

Protein structure alignment is the problem of determining an assignment between the amino-acid residues of two given proteins in a way that maximizes a measure of similarity between the two superimposed protein structures. By identifying geometric similarities, structure alignment algorithms provide critical insights into protein functional similarities. Existing structure alignment tools adopt a two-stage approach to structure alignment by decoupling and iterating between the assignment evaluation and structure superposition problems. We introduce a novel approach, SAS-Pro, which addresses the assignment evaluation and structure superposition simultaneously by formulating the alignment problem as a single bilevel optimization problem. The new formulation does not require the sequentiality constraints, thus generalizing the scope of the alignment methodology to include non-sequential protein alignments. We employ derivative-free optimization methodologies for searching for the global optimum of the highly nonlinear and non-differentiable RMSD function encountered in the proposed model. Alignments obtained with SAS-Pro have better RMSD values and larger lengths than those obtained from other alignment tools. For non-sequential alignment problems, SAS-Pro leads to alignments with high degree of similarity with known reference alignments. The source code of SAS-Pro is available for download at http://eudoxus.cheme.cmu.edu/saspro/SAS-Pro.html.

Highlights

  • Protein alignment is a problem that has gained tremendous attention in bioinformatics and proteomics due to its applicability in protein clustering, identifying homology relationships, and inferring structure-activity relationships about new and existing proteins

  • Research on protein sequence alignment has led to the development of numerous dynamic programming algorithms [1,2] that are central to the BLAST code [3,4], an alignment tool that radically transformed the bioinformatics field and found extensive applications in the biotechnology industry

  • Physical comparisons of protein structures [5,6] further demonstrate the need for direct comparison of 3D protein structures, known as the protein structure alignment problem, which is the focus of this paper

Read more

Summary

Introduction

Protein alignment is a problem that has gained tremendous attention in bioinformatics and proteomics due to its applicability in protein clustering, identifying homology relationships, and inferring structure-activity relationships about new and existing proteins. Research on protein sequence alignment has led to the development of numerous dynamic programming algorithms [1,2] that are central to the BLAST code [3,4], an alignment tool that radically transformed the bioinformatics field and found extensive applications in the biotechnology industry. Structural information of proteins is difficult to infer from sequence information alone. While sequence similarity generally implies structural similarity between proteins, there exist a large number of protein pairs, including haemoglobin and myoglobin found in the human body, that are structurally similar but possess low sequence similarities ( known as twilight zone proteins). Physical comparisons of protein structures [5,6] further demonstrate the need for direct comparison of 3D protein structures, known as the protein structure alignment problem, which is the focus of this paper

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.