Abstract

Assessing the quality of a protein structure model is essential for protein structure prediction. Here, we developed a Support Vector Machine (SVM) method to predict the quality score (GDT-TS score) of a protein structure model from the features extracted from the sequence alignment used to generate the model. We developed a Support Vector Machine (SVM) model quality assessment method, taking either a query-single-template pairwise alignment or a query-multitemplate alignment as input. For the pairwise alignment scheme, the input features fed into the SVM predictor include the normalized e-value of the given alignment, the percentage of identical residue pairs in the alignment, the percentage of residues of the query aligned with those of the template, and the sum of the BLOSUM scores of all aligned residues divided by the length of the aligned positions. Similarly, for the multiple-alignment scheme, the input features include the percentage of the residues of the target sequence aligned with those in one or more templates, the percentage of aligned residues of the target sequence that are the same as that of any one template, the average BLOSUM score of aligned residues and the average Gonnet160 score of aligned residues. A SVM regression predictor was trained on the training data to predict the GDT-TS scores of the models from the input features. The Root Mean Square Error (RMSE) and the Absolute Mean Error (ABS) between predicted and real GDT-TS scores were calculated to evaluate the performance. A five-fold cross validation was applied to select the best parameter values based on the average RMSE and ABS on the five folds. The RMSE and ABS of the optimized SVM predictor on the testing data were close to 0.1. The good performance of the SVM and sequence alignment based predictor indicates that integrating sequence alignment features with a SVM is effective for protein model quality assessment.

Highlights

  • The knowledge of protein three-dimensional (3D) structures is vitally important for biomedical research, such as protein function analysis, mutagenesis experiments and rational drug design

  • The trained pairwise alignment based Support Vector Machine (SVM) predictor was applied to predict the GDT-TS scores of models of 46 CASP9 targets generated from 225 PSI-BLAST alignments that were not used in training

  • The total real GDT-TS score of the top 1 models selected by the SVM predictor for these targets was compared with that of the top 1 models selected, according to the evalues of the PSI-BLAST alignments and that of the top 1 models selected by APOLLO [8], a black box quality assessment tool using a pairwise model comparison approach

Read more

Summary

Introduction

The knowledge of protein three-dimensional (3D) structures is vitally important for biomedical research, such as protein function analysis, mutagenesis experiments and rational drug design. Since the accuracy of predicted protein structures depend on the relatedness of homologous structural templates and the correctness of sequence alignment [4], assessing the quality of protein structural models is important for controlling and analysing the quality of the predicted models. A number of model quality assessment methods and tools, such as ModelEvaluator [7], APOLLO [8], QMEAN [9], have been developed. These methods evaluate the quality of models based on the structural information extracted from protein models, without considering the source information (e.g. sequence alignment, homologous template structure), used to generate the models. The quality assessment methods without utilizing the source information may be considered a black box approach, while those considering the source information [10], is a white box approach [11]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call