Abstract

Calpain, an intracellular -dependent cysteine protease, is known to play a role in a wide range of metabolic pathways through limited proteolysis of its substrates. However, only a limited number of these substrates are currently known, with the exact mechanism of substrate recognition and cleavage by calpain still largely unknown. While previous research has successfully applied standard machine-learning algorithms to accurately predict substrate cleavage by other similar types of proteases, their approach does not extend well to calpain, possibly due to its particular mode of proteolytic action and limited amount of experimental data. Through the use of Multiple Kernel Learning, a recent extension to the classic Support Vector Machine framework, we were able to train complex models based on rich, heterogeneous feature sets, leading to significantly improved prediction quality (6% over highest AUC score produced by state-of-the-art methods). In addition to producing a stronger machine-learning model for the prediction of calpain cleavage, we were able to highlight the importance and role of each feature of substrate sequences in defining specificity: primary sequence, secondary structure and solvent accessibility. Most notably, we showed there existed significant specificity differences across calpain sub-types, despite previous assumption to the contrary. Prediction accuracy was further successfully validated using, as an unbiased test set, mutated sequences of calpastatin (endogenous inhibitor of calpain) modified to no longer block calpain's proteolytic action. An online implementation of our prediction tool is available at http://calpain.org.

Highlights

  • Calpain (EC 3.4.22.17, Clan CA, family C02) is an intracellular Ca2z-dependent cysteine protease known to regulate substrate functions by limited proteolysis, i.e. proteolytic processing [1,2,3,4,5,6,7,8], resulting in the modulation of a wide variety of biological phenomena

  • In order to keep the size of input features down and avoid unnecessary noise, it was critical to accurately narrow down sequence regions directly or indirectly involved in substrate recognition and cleavage for each type of feature

  • Despite using no more input data than single kernel methods (Table 2, I99), our method resulted in a considerable Area under ROC Curve (AUC) increase from the baseline score of 76.86%

Read more

Summary

Introduction

Calpain (EC 3.4.22.17, Clan CA, family C02) is an intracellular Ca2z-dependent cysteine protease known to regulate substrate functions by limited proteolysis, i.e. proteolytic processing [1,2,3,4,5,6,7,8], resulting in the modulation of a wide variety of biological phenomena. For precise modulation of substrate functions by calpains, the cleavage sites are anticipated to be strictly determined depending on substrates [15]. The positions of the cleavage sites are essential determinants for how calpains modulate substrate functions. If cleavage sites are determined, antibodies specific to the sites [16,17,18] and inhibitors for specific substrate proteolysis [19,20,21] can be designed to analyze proteolytic events by calpain under various conditions. Many studies have been attempted to predict calpain cleavage sites [22,23,24], precise prediction has never been successful so far

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call