Abstract

Protein phosphorylation is a posttranslational modification (PTM or PTLM), where a phosphoryl group is added to the residue(s) of a protein molecule. The most commonly phosphorylated amino acids occur at serine (S), threonine (T), and tyrosine (Y). Protein phosphorylation plays a significant role in a wide range of cellular processes; meanwhile its dysregulation is also involved with many diseases. Therefore, from the angles of both basic research and drug development, we are facing a challenging problem: for an uncharacterized protein sequence containing many residues of S, T, or Y, which ones can be phosphorylated, and which ones cannot? To address this problem, we have developed a predictor called iPhos-PseEn by fusing four different pseudo component approaches (amino acids’ disorder scores, nearest neighbor scores, occurrence frequencies, and position weights) into an ensemble classifier via a voting system. Rigorous cross-validations indicated that the proposed predictor remarkably outperformed its existing counterparts. For the convenience of most experimental scientists, a user-friendly web-server for iPhos-PseEn has been established at http://www.jci-bioinfo.cn/iPhos-PseEn, by which users can easily obtain their desired results without the need to go through the complicated mathematical equations involved.

Highlights

  • Cancer and many other major diseases are often caused by varieties of subtle modifications in biological sequences, typically by various types of post-translational modification (PTM or PTLM) in protein [1, 2], postreplication modification (PTRM) in DNA [3] and posttranscription modification (PTCM) in RNA [4]

  • The success rates achieved by the iPhos-PseEn predictor via the 5-fold cross validation for S, T- and Y-type phosphorylation are given in Table 1, where for facilitating comparison the corresponding rates by Musite [22] and Position Weight Amino Acid Composition (PWAAC) [24] are listed

  • As we can see from the table, compared with its counterparts, iPhos-PseEn is remarkably better than its counterparts in predicting all the three phosphorylation types as measured with all the four metrics, clearly indicating that the proposed predictor can achieve higher sensitivity, specificity, and overall accuracy but is much more stable

Read more

Summary

Introduction

Cancer and many other major diseases are often caused by varieties of subtle modifications in biological sequences, typically by various types of post-translational modification (PTM or PTLM) in protein [1, 2], postreplication modification (PTRM) in DNA [3] and posttranscription modification (PTCM) in RNA [4]. In order to reveal the pathological mechanisms of these diseases and find new and revolutionary strategies to treat them, many efforts have been made with the aim to identify the possible modified sites in protein (see, e.g., [5,6,7,8,9,10,11,12,13,14], DNA [15, 16], and RNA sequences [17, 18]). Protein phosphorylation is one of the most-studied post-translational modification (PTM or PTLM) that can alter the structural conformation of a protein, causing it to become activated, deactivated, or modifying its function. Information of phosphorylation sites in proteins is significant for both basic research and drug development

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call