Abstract

BackgroundIn structural biology, similarity analysis of protein structure is a crucial step in studying the relationship between proteins. Despite the considerable number of techniques that have been explored within the past two decades, the development of new alternative methods is still an active research area due to the need for high performance tools.ResultsIn this paper, we present TS-AMIR, a Topology String Alignment Method for Intensive Rapid comparison of protein structures. The proposed method works in two stages: In the first stage, the method generates a topology string based on the geometric details of secondary structure elements, and then, utilizes an n-gram modelling technique over entropy concept to capture similarities in these strings. This initial correspondence map between secondary structure elements is submitted to the second stage in order to obtain the alignment at the residue level. Applying the Kabsch method, a heuristic step-by-step algorithm is adopted in the second stage to align the residues, resulting in an optimal rotation matrix and minimized RMSD. The performance of the method was assessed in different information retrieval tests and the results were compared with those of CE and TM-align, representing two geometrical tools, and YAKUSA, 3D-BLAST and SARST as three representatives of linear encoding schemes. It is shown that the method obtains a high running speed similar to that of the linear encoding schemes. In addition, the method runs about 800 and 7200 times faster than TM-align and CE respectively, while maintaining a competitive accuracy with TM-align and CE.ConclusionsThe experimental results demonstrate that linear encoding techniques are capable of reaching the same high degree of accuracy as that achieved by geometrical methods, while generally running hundreds of times faster than conventional programs.

Highlights

  • In structural biology, similarity analysis of protein structure is a crucial step in studying the relationship between proteins

  • The procedure makes a topology string based on the geometry of the secondary structure elements of each structure followed by the application of the n-gram modelling technique to find the best matching condition between two structures

  • Having secondary structure elements (SSEs) of a protein extracted from its PDB file, TS-AMIR summarizes SSEs in a topology string based on the direction of each SSE vector in 3D-space

Read more

Summary

Introduction

Similarity analysis of protein structure is a crucial step in studying the relationship between proteins. Sequence comparison tools are commonly used to determine the similarities between proteins with a high degree of similarity, whereas structure comparison methods are essentially utilized to highlight the evolutionary relationships among proteins. Considering the large amount of unknown data which exists in structural biology, efficient powerful tools are needed to investigate, analyze and classify the properties and functionalities of this data. The lack of a universal measurement standard for the evaluation of these methods has persuaded biologists to use different tools and scoring functions in their inquiries. This diversity has provided facilities for biologists to extract their required information for query data more efficiently

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call