Abstract

BackgroundProtein structural data has increased exponentially, such that fast and accurate tools are necessary to access structure similarity search. To improve the search speed, several methods have been designed to reduce three-dimensional protein structures to one-dimensional text strings that are then analyzed by traditional sequence alignment methods; however, the accuracy is usually sacrificed and the speed is still unable to match sequence similarity search tools. Here, we aimed to improve the linear encoding methodology and develop efficient search tools that can rapidly retrieve structural homologs from large protein databases.ResultsWe propose a new linear encoding method, SARST (Structural similarity search Aided by Ramachandran Sequential Transformation). SARST transforms protein structures into text strings through a Ramachandran map organized by nearest-neighbor clustering and uses a regenerative approach to produce substitution matrices. Then, classical sequence similarity search methods can be applied to the structural similarity search. Its accuracy is similar to Combinatorial Extension (CE) and works over 243,000 times faster, searching 34,000 proteins in 0.34 sec with a 3.2-GHz CPU. SARST provides statistically meaningful expectation values to assess the retrieved information. It has been implemented into a web service and a stand-alone Java program that is able to run on many different platforms.ConclusionAs a database search method, SARST can rapidly distinguish high from low similarities and efficiently retrieve homologous structures. It demonstrates that the easily accessible linear encoding methodology has the potential to serve as a foundation for efficient protein structural similarity search tools. These search tools are supposed applicable to automated and high-throughput functional annotations or predictions for the ever increasing number of published protein structures in this post-genomic era.

Highlights

  • Protein structural data has increased exponentially, such that fast and accurate tools are necessary to access structure similarity search

  • We propose a linear encoding algorithm, Ramachandran sequential transformation, and introduce an efficient protein structural similarity search method, SARST

  • Algorithm – Ramachandran sequential transformation (RST) One thousand domains, each composed of a single polypeptide chain without missing residues, were randomly selected as the training set from the ASTRAL SCOP 1.67 40% identities (ID) subset [32,33,34] [see Additional file 1]

Read more

Summary

Introduction

Protein structural data has increased exponentially, such that fast and accurate tools are necessary to access structure similarity search. To improve the search speed, several methods have been designed to reduce three-dimensional protein structures to one-dimensional text strings that are analyzed by traditional sequence alignment methods; the accuracy is usually sacrificed and the speed is still unable to match sequence similarity search tools. We aimed to improve the linear encoding methodology and develop efficient search tools that can rapidly retrieve structural homologs from large protein databases. Martin (2000) developed TOPSCAN, which uses topology strings to represent protein structures [17] Most of these methods could not reach the accuracy comparable to conventional 3D structural comparison methods, and the implementation of some of them were limited because their methodology could not conveniently analyze fragments with missing residues [18]. There are advantages of the one-dimensional (1D) representation of protein structure, such as its easy applicability to multiple structural alignments [16], fold-recognition and genome annotations [19,20]; besides, local backbone structure prediction has long been using linear encoding methodologies [24,25,26,27]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call