Abstract
Kinase mediated phosphorylation site detection is the key mechanism of post translational mechanism that plays an important role in regulating various cellular processes and phenotypes. Many diseases, like cancer are related with the signaling defects which are associated with protein phosphorylation. Characterizing the protein kinases and their substrates enhances our ability to understand the mechanism of protein phosphorylation and extends our knowledge of signaling network; thereby helping us to treat such diseases. Experimental methods for predicting phosphorylation sites are labour intensive and expensive. Also, manifold increase of protein sequences in the databanks over the years necessitates the improvement of high speed and accurate computational methods for predicting phosphorylation sites in protein sequences. Till date, a number of computational methods have been proposed by various researchers in predicting phosphorylation sites, but there remains much scope of improvement. In this communication, we present a simple and novel method based on Grammatical Inference (GI) approach to automate the prediction of kinase specific phosphorylation sites. In this regard, we have used a popular GI algorithm Alergia to infer Deterministic Stochastic Finite State Automata (DSFA) which equally represents the regular grammar corresponding to the phosphorylation sites. Extensive experiments on several datasets generated by us reveal that, our inferred grammar successfully predicts phosphorylation sites in a kinase specific manner. It performs significantly better when compared with the other existing phosphorylation site prediction methods. We have also compared our inferred DSFA with two other GI inference algorithms. The DSFA generated by our method performs superior which indicates that our method is robust and has a potential for predicting the phosphorylation sites in a kinase specific manner.
Highlights
Protein phosphorylation is one of the most important ubiquitous post-translational modifications
In order to evaluate the performance of our method, we have generated Receiver Operating Curve (ROC) for each of the kinase specific phosphorylation site predictor
Receiver operating characteristic (ROC) curve shows the trade-off between True Positive Rate (TPR) i.e., Sensitivity and False Positive Rate (FPR) i.e., 1-Specificity
Summary
Protein phosphorylation is one of the most important ubiquitous post-translational modifications. Researchers are concentrating upon using ensemble mechanism instead of a single classifier for predicting protein phosphorylation sites [14,15, 17] Another method Musite is considered for large scale predictions of both non-kinase and kinase-specific phosphorylation sites [19]. In 2014, Suo et al have proposed a method PSEA for predicting kinase specific phosphorylation sites as well as for analyzing the types of kinases corresponding to all disease-related phosphorylation substrates [22] All of these methods require a good data encoding scheme as it plays a crucial role in affecting the performance of the classifiers. These methods require an a priori knowledge about the computational models of phosphorylation sites to permit automatic annotation
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.