Abstract

BackgroundThe majority of peptide bonds in proteins are found to occur in the trans conformation. However, for proline residues, a considerable fraction of Prolyl peptide bonds adopt the cis form. Proline cis/trans isomerization is known to play a critical role in protein folding, splicing, cell signaling and transmembrane active transport. Accurate prediction of proline cis/trans isomerization in proteins would have many important applications towards the understanding of protein structure and function.ResultsIn this paper, we propose a new approach to predict the proline cis/trans isomerization in proteins using support vector machine (SVM). The preliminary results indicated that using Radial Basis Function (RBF) kernels could lead to better prediction performance than that of polynomial and linear kernel functions. We used single sequence information of different local window sizes, amino acid compositions of different local sequences, multiple sequence alignment obtained from PSI-BLAST and the secondary structure information predicted by PSIPRED. We explored these different sequence encoding schemes in order to investigate their effects on the prediction performance. The training and testing of this approach was performed on a newly enlarged dataset of 2424 non-homologous proteins determined by X-Ray diffraction method using 5-fold cross-validation. Selecting the window size 11 provided the best performance for determining the proline cis/trans isomerization based on the single amino acid sequence. It was found that using multiple sequence alignments in the form of PSI-BLAST profiles could significantly improve the prediction performance, the prediction accuracy increased from 62.8% with single sequence to 69.8% and Matthews Correlation Coefficient (MCC) improved from 0.26 with single local sequence to 0.40. Furthermore, if coupled with the predicted secondary structure information by PSIPRED, our method yielded a prediction accuracy of 71.5% and MCC of 0.43, 9% and 0.17 higher than the accuracy achieved based on the singe sequence information, respectively.ConclusionA new method has been developed to predict the proline cis/trans isomerization in proteins based on support vector machine, which used the single amino acid sequence with different local window sizes, the amino acid compositions of local sequence flanking centered proline residues, the position-specific scoring matrices (PSSMs) extracted by PSI-BLAST and the predicted secondary structures generated by PSIPRED. The successful application of SVM approach in this study reinforced that SVM is a powerful tool in predicting proline cis/trans isomerization in proteins and biological sequence analysis.

Highlights

  • The majority of peptide bonds in proteins are found to occur in the trans conformation

  • We propose a novel method to predict the proline cis/trans isomerization based on support vector machine, which combined the position-specific scoring matrices (PSSM) extracted from the sequence profiles by PSI-BLAST [19] and the predicted secondary structures generated by PSIPRED program [20], as the SVM input vector in addition to the single amino acid sequence information

  • Our results indicate that SVM classifier built on multiple sequence alignment in the form of PSI-BLAST profiles could yield better performance, the prediction accuracy improved from 62.8% with single sequence to 69.8%, while Matthews Correlation Coefficient (MCC) improved from 0.26 with single local sequence to 0.40

Read more

Summary

Introduction

The majority of peptide bonds in proteins are found to occur in the trans conformation. For proline residues, a considerable fraction of Prolyl peptide bonds adopt the cis form. A considerable proportion (about 4–5%) of Xaa-Pro peptide bonds adopts the cis conformation, while only 0.03– 0.05% Xaa-nonPro bonds occur in the cis form [2,3,4]. There are an increasing number of known protein structures determined which exhibit conformational heterogeneity of one or more prolyl peptide bonds [5]. The isomerization process of Xaa-Pro peptide bonds can be catalyzed and accelerated by the so-called peptidyl prolyl cis/trans isomerase [10], which are found to be involved in cell signaling and cell replication, and be implicated in the induction of severe diseases such as cancer, AIDS, Alzheimer's disease and other neurodegenerative disorders [11]. Proline isomerization functions as molecular switch due to its potential ability to control protein activity within the confines of the intrinsic conformational exchange [5]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call