TANGLE: Two-Level Support Vector Regression Approach for Protein Backbone Torsion Angle Prediction from Primary Sequences

Jiangning Song,Mingjun Wang,Geoffrey I Webb,Tatsuya Akutsu,Hao Tan

doi:10.1371/journal.pone.0030361

Abstract

Protein backbone torsion angles (Phi) and (Psi) involve two rotation angles rotating around the Cα-N bond (Phi) and the Cα-C bond (Psi). Due to the planarity of the linked rigid peptide bonds, these two angles can essentially determine the backbone geometry of proteins. Accordingly, the accurate prediction of protein backbone torsion angle from sequence information can assist the prediction of protein structures. In this study, we develop a new approach called TANGLE (Torsion ANGLE predictor) to predict the protein backbone torsion angles from amino acid sequences. TANGLE uses a two-level support vector regression approach to perform real-value torsion angle prediction using a variety of features derived from amino acid sequences, including the evolutionary profiles in the form of position-specific scoring matrices, predicted secondary structure, solvent accessibility and natively disordered region as well as other global sequence features. When evaluated based on a large benchmark dataset of 1,526 non-homologous proteins, the mean absolute errors (MAEs) of the Phi and Psi angle prediction are 27.8° and 44.6°, respectively, which are 1% and 3% respectively lower than that using one of the state-of-the-art prediction tools ANGLOR. Moreover, the prediction of TANGLE is significantly better than a random predictor that was built on the amino acid-specific basis, with the p-value<1.46e-147 and 7.97e-150, respectively by the Wilcoxon signed rank test. As a complementary approach to the current torsion angle prediction algorithms, TANGLE should prove useful in predicting protein structural properties and assisting protein fold recognition by applying the predicted torsion angles as useful restraints. TANGLE is freely accessible at http://sunflower.kuicr.kyoto-u.ac.jp/~sjn/TANGLE/.

Highlights

As a result of the completion of whole-genome sequencing projects, the sequence-structure gap is rapidly increasing
The single-peak distribution of phi angles and double-peak distribution of psi angles in the Ramachandran plot, result in the different degrees of uncertainty and the different prediction accuracy for the phi and psi angles [11]. This leads to different prediction difficulty for these two types of torsion angles. Due to their double-peak distribution, it is more difficult to predict Psi angles than the singlepeak polypeptide chains: Q (Phi) angles, which is reflected by higher mean absolute errors (MAEs) and root mean square error (RMSE) values for Phi angles but lower values for Psi angles
Case study To understand from where the difficulties of torsion angle prediction arise and illustrate the significance of CC, RMSE and MAE measures used in this study, we presented three illustrative examples of TANGLE prediction of Phi and Psi angles and compared the predicted and observed torsion angle profiles for three proteins (Figure 7): the beta1-subunit of the signaltransducing G protein heterotrimer (PDB ID: 1b9x, chain A) [85], the enzyme IIAlactose from Lactococcus lactis (PDB ID: 1e2a, chain A) [86] and the bee venom hyaluronidase in a complex with hyaluronic acid tetramer (PDB ID: 1fcv, chain A) [87]

Summary

Introduction

As a result of the completion of whole-genome sequencing projects, the sequence-structure gap is rapidly increasing In this context, the accurate prediction of protein structure and function from sequences remains a challenging task. With respect to torsion angles, there is increasing interest in the field of structural bioinformatics in developing efficient algorithms that are capable of accurately predicting protein backbone torsion angles from amino acid sequences. This is because they can provide more detailed description of the backbone conformations, which, if known, can significantly reduce the conformational search and contribute towards the final prediction of protein three-dimensional structure predictions. Predicted torsion angles have been applied to improve protein secondary structure prediction [17,18], protein fold recognition [19,20,21], multiple sequence alignments [22,23] and fragment-free tertiary-structure prediction [10]

Methods

Results

Conclusion