Abstract
BackgroundProtein phosphorylation catalyzed by kinases plays crucial regulatory roles in intracellular signal transduction. Due to the difficulty in performing high-throughput mass spectrometry-based experiment, there is a desire to predict phosphorylation sites using computational methods. However, previous studies regarding in silico prediction of plant phosphorylation sites lack the consideration of kinase-specific phosphorylation data. Thus, we are motivated to propose a new method that investigates different substrate specificities in plant phosphorylation sites.ResultsExperimentally verified phosphorylation data were extracted from TAIR9-a protein database containing 3006 phosphorylation data from the plant species Arabidopsis thaliana. In an attempt to investigate the various substrate motifs in plant phosphorylation, maximal dependence decomposition (MDD) is employed to cluster a large set of phosphorylation data into subgroups containing significantly conserved motifs. Profile hidden Markov model (HMM) is then applied to learn a predictive model for each subgroup. Cross-validation evaluation on the MDD-clustered HMMs yields an average accuracy of 82.4% for serine, 78.6% for threonine, and 89.0% for tyrosine models. Moreover, independent test results using Arabidopsis thaliana phosphorylation data from UniProtKB/Swiss-Prot show that the proposed models are able to correctly predict 81.4% phosphoserine, 77.1% phosphothreonine, and 83.7% phosphotyrosine sites. Interestingly, several MDD-clustered subgroups are observed to have similar amino acid conservation with the substrate motifs of well-known kinases from Phospho.ELM-a database containing kinase-specific phosphorylation data from multiple organisms.ConclusionsThis work presents a novel method for identifying plant phosphorylation sites with various substrate motifs. Based on cross-validation and independent testing, results show that the MDD-clustered models outperform models trained without using MDD. The proposed method has been implemented as a web-based plant phosphorylation prediction tool, PlantPhos http://csb.cse.yzu.edu.tw/PlantPhos/. Additionally, two case studies have been demonstrated to further evaluate the effectiveness of PlantPhos.
Highlights
Protein phosphorylation catalyzed by kinases plays crucial regulatory roles in intracellular signal transduction
Data clustering by maximal dependence decomposition One of the aims of this study is to investigate the substrate site specificity of plant phosphorylation sites based on amino acid sequences
Investigation of substrate site specificities This work aims to investigate the various substrate site specificities in the plant species Arabidopsis thaliana based on amino acid sequences
Summary
Protein phosphorylation catalyzed by kinases plays crucial regulatory roles in intracellular signal transduction. We are motivated to propose a new method that investigates different substrate specificities in plant phosphorylation sites. Protein phosphorylation is the most widespread and well-studied post-translational modification in eukaryotic cells. It is one of the most prevalent intracellular protein modifications that influence numerous cellular processes [1]. Protein phosphorylation, catalyzed by specific kinases, plays crucial regulatory. Protein phosphorylation regulates various cellular processes in mammals and in plants. Phosphorylation is involved in modulating a sucrose phosphate synthase enzyme which controls the signaling pathway for the process of sucrose synthesis in plants [6]. Stone et al have identified a number of plant kinases; the precise functional roles of specific protein kinases were not widely elucidated [4]
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have