Improving protein fold recognition using the amalgamation of evolutionary-based and structural based information.

Kuldip K Paliwal,Abdollah Dehzangi,Alok Sharma,James Lyons

doi:10.1186/1471-2105-15-s16-s12

Kuldip K Paliwal, Abdollah Dehzangi + Show 2 more

Open Access

https://doi.org/10.1186/1471-2105-15-s16-s12

Copy DOI

Abstract

Deciphering three dimensional structure of a protein sequence is a challenging task in biological science. Protein fold recognition and protein secondary structure prediction are transitional steps in identifying the three dimensional structure of a protein. For protein fold recognition, evolutionary-based information of amino acid sequences from the position specific scoring matrix (PSSM) has been recently applied with improved results. On the other hand, the SPINE-X predictor has been developed and applied for protein secondary structure prediction. Several reported methods for protein fold recognition have only limited accuracy. In this paper, we have developed a strategy of combining evolutionary-based information (from PSSM) and predicted secondary structure using SPINE-X to improve protein fold recognition. The strategy is based on finding the probabilities of amino acid pairs (AAP). The proposed method has been tested on several protein benchmark datasets and an improvement of 8.9% recognition accuracy has been achieved. We have achieved, for the first time over 90% and 75% prediction accuracies for sequence similarity values below 40% and 25%, respectively. We also obtain 90.6% and 77.0% prediction accuracies, respectively, for the Extended Ding and Dubchak and Taguchi and Gromiha benchmark protein fold recognition datasets widely used for in the literature.

Highlights

Recognition of protein folds is an essential step in identifying the tertiary structure of proteins
We have developed a strategy of combining evolutionary-based information and predicted secondary structure using SPINE-X to improve protein fold recognition
We developed k-amino acid pair (AAP) feature extraction method based on position specific scoring matrix (PSSM) and secondary structure prediction matrix (SSPM), and show its usefulness on several protein benchmark datasets

Summary

Introduction

Recognition of protein folds is an essential step in identifying the tertiary structure of proteins. Dubchak et al [1] have shown importance of syntactical and physicochemical features in protein fold recognition using amino acid composition (AAC), in conjunction with five physicochemical attributes of amino acids: hydrophobicity (H), polarity (P), van der Waals volume (V), predicted secondary structure based on normalized frequency of α-helix (X) and polarizability (Z). Their 120-dimensional feature set is composed of 20 AAC together with 105 physicochemical features. For more feature extraction or selection methods please see [33,34,35,36,37,38,39,40]

Methods

Results

Conclusion