Abstract

BackgroundThe prediction of calmodulin-binding (CaM-binding) proteins plays a very important role in the fields of biology and biochemistry, because the calmodulin protein binds and regulates a multitude of protein targets affecting different cellular processes. Computational methods that can accurately identify CaM-binding proteins and CaM-binding domains would accelerate research in calcium signaling and calmodulin function. Short-linear motifs (SLiMs), on the other hand, have been effectively used as features for analyzing protein-protein interactions, though their properties have not been utilized in the prediction of CaM-binding proteins.ResultsWe propose a new method for the prediction of CaM-binding proteins based on both the total and average scores of known and new SLiMs in protein sequences using a new scoring method called sliding window scoring (SWS) as features for the prediction module. A dataset of 194 manually curated human CaM-binding proteins and 193 mitochondrial proteins have been obtained and used for testing the proposed model. The motif generation tool, Multiple EM for Motif Elucidation (MEME), has been used to obtain new motifs from each of the positive and negative datasets individually (the SM approach) and from the combined negative and positive datasets (the CM approach). Moreover, the wrapper criterion with random forest for feature selection (FS) has been applied followed by classification using different algorithms such as k-nearest neighbors (k-NN), support vector machines (SVM), naive Bayes (NB) and random forest (RF).ConclusionsOur proposed method shows very good prediction results and demonstrates how information contained in SLiMs is highly relevant in predicting CaM-binding proteins. Further, three new CaM-binding motifs have been computationally selected and biologically validated in this study, and which can be used for predicting CaM-binding proteins.

Highlights

  • The prediction of calmodulin-binding (CaM-binding) proteins plays a very important role in the fields of biology and biochemistry, because the calmodulin protein binds and regulates a multitude of protein targets affecting different cellular processes

  • To test our proposed method and perform an in-depth analysis of the strength of Short-linear motifs (SLiMs) as the prediction properties, four different classification methods including support vector machines (SVM), k-nearest neighbors (k-NN), random forest (RF) and naive Bayes (NB), and different feature selection methods including Chi2 and the wrapper RF method have been used on our datasets using Waikato Environment for Knowledge Analysis (WEKA) ver. 3.7.11 [17]

  • The performances of the prediction methods are compared in terms of their areas under the receiving operating characteristics (ROC) curve, accuracies, and Matthews correlation coefficient (MCC) which are computed as follows: Accuracy

Read more

Summary

Introduction

The prediction of calmodulin-binding (CaM-binding) proteins plays a very important role in the fields of biology and biochemistry, because the calmodulin protein binds and regulates a multitude of protein targets affecting different cellular processes. Calmodulin (CaM) is a calcium-binding protein that is a major transducer of calcium signaling [1] and is a key signaling molecule for multicellular organisms. It has no enzymatic activity of its own but rather acts by binding to and altering the activity on a panel of cellular protein targets at a variety of motifs through binding mechanisms. The Hidden Markov Model prediction tool in the Calmodulin Target Database [2] is limited to the classic CaM-binding motifs and has no power to identify novel ones

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call