Abstract

Protein phosphorylation is an important type of post-translational modification that regulates various activities of cell life inside human body. The accurate identification of phosphorylation sites can provide new insights for revealing the specific function of protein. However, it is time-consuming and inefficient to apply the experiment-based techniques in investigating the phosphorylation sites in proteins. Additionally, computational approaches are regarded as an ideal choice in such a big data era. Therefore, in this work, we designed a new computational method to identify phosphorylation sites. At first, phosphorylation data was collected from human proteins to construct an objective and strict benchmark dataset. By a series of feature analysis, we found that the distributions of conservation scores and nine physicochemical properties surrounding the phosphorylation sites in positive samples are significantly different from those surrounding non-phosphorylation sites in negative samples. Based on these features, a novel sequence-based method for predicting the phosphorylation sites in human proteomics was proposed, which incorporated the conservation scores with position-associated attributes that reflect the correlation of physicochemical characteristics among amino acid residues. Furthermore, the analysis of variance (ANOVA) was utilized to obtain the optimal feature subset which could produce the highest accuracy. Comparison with the published predictor demonstrated the superiority of our predictor. Finally, a user-friendly online tool called iPhoPred was established and can be freely available at http://lin-group.cn/server/iPhoPred/ . We hope the tool will provide important guide for the study of protein phosphorylation.

Highlights

  • Most of proteins undergone post-translational modifications (PTMs) that endow the raw proteins with proper structure and specific function

  • The results showed that position conservation scores of positive samples are much higher than the negative samples in general, and the physicochemical properties in positive samples are apparently different from the counterparts in the negative dataset

  • One may notice that the position conservation scores in phosphoserine samples are significantly higher than those in the non-phosphoserine samples, demonstrating the flanking sequences around phosphoserine site are more conservative

Read more

Summary

Introduction

Most of proteins undergone post-translational modifications (PTMs) that endow the raw proteins with proper structure and specific function. Several researches have showed that there exist more than 200 types of PTMs [1], [2]. Protein phosphorylation is one of the most widespread PTM types that regulate almost all aspects of cell life, including proliferation, differentiation, metabolism, DNA replication, cell division and apoptosis [3], [4]. Help of catalysis by kinases, protein phosphorylation can be caused by transferring a phosphate group (PO4) from adenosine triphosphate (ATP) to the targeted residues— namely serine (S), threonine (T) and tyrosine (Y) [5]. The phosphate groups can be removed by the catalysis of phosphatases.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call