Abstract

To understand enzyme functions, identifying the catalytic residues is a usual first step. Moreover, knowledge about catalytic residues is also useful for protein engineering and drug-design. However, to experimentally identify catalytic residues remains challenging for reasons of time and cost. Therefore, computational methods have been explored to predict catalytic residues. Here, we developed a new algorithm, L1pred, for catalytic residue prediction, by using the L1-logreg classifier to integrate eight sequence-based scoring functions. We tested L1pred and compared it against several existing sequence-based methods on carefully designed datasets Data604 and Data63. With ten-fold cross-validation, L1pred showed the area under precision-recall curve (AUPR) and the area under ROC curve (AUC) of 0.2198 and 0.9494 on the training dataset, Data604, respectively. In addition, on the independent test dataset, Data63, it showed the AUPR and AUC values of 0.2636 and 0.9375, respectively. Compared with other sequence-based methods, L1pred showed the best performance on both datasets. We also analyzed the importance of each attribute in the algorithm, and found that all the scores contributed more or less equally to the L1pred performance.

Highlights

  • Enzymes are very important because they act as catalysts for almost all chemical reactions in a cell to make the reaction rates sufficient for life

  • Of the four methods used for comparison, Jensen-Shannon divergence (JSD), Venn diagram and JSD conservation score (VJSD), and Consurf do not need a training procedure, while CRpred does and it was trained using the same procedure as our method

  • Results on the Independent Test Dataset Data63 All chosen methods were compared using the independent test set, Data63, and the results were in broad agreement with what found on the dataset Data604

Read more

Summary

Introduction

Enzymes are very important because they act as catalysts for almost all chemical reactions in a cell to make the reaction rates sufficient for life. The number of proteins with known catalytic sites compared with the huge number of enzymes is still small, as it is often expensive and time consuming to experimentally identify catalytic residues. Computational methods have become an important tool to predict catalytic residues with more and more annotated enzymes available. In the past decade and a half, many computational methods have been developed to predict catalytic residues on given enzymes. Machine learning algorithms, such as Support Vector Machine-based (SVM) and Neural Network-based (NN), were used to develop new catalytic residue prediction methods [33,34,35,36,37,38,39,40]. The machine-learning algorithms can integrate various chemical and physical features of residues, such as sequence conservation, residue types, cumulative hydrophobicity, secondary structure, and relative solvent accessibility. The flourishing efforts demonstrated promising potentials of computational methods on this research front, yet higher prediction accuracy is still needed for better performance

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call