Abstract

As one of the most important post-translational modifications (PTMs), phosphorylation refers to the binding of a phosphate group with amino acid residues like Ser (S), Thr (T) and Tyr (Y) thus resulting in diverse functions at the molecular level. Abnormal phosphorylation has been proved to be closely related with human diseases. To our knowledge, no research has been reported describing specific disease-associated phosphorylation sites prediction which is of great significance for comprehensive understanding of disease mechanism. In this work, focusing on three types of leukemia, we aim to develop a reliable leukemia-related phosphorylation site prediction models by combing deep convolutional neural network (CNN) with transfer-learning. CNN could automatically discover complex representations of phosphorylation patterns from the raw sequences, and hence it provides a powerful tool for improvement of leukemia-related phosphorylation site prediction. With the largest dataset of myelogenous leukemia, the optimal models for S/T/Y phosphorylation sites give the AUC values of 0.8784, 0.8328 and 0.7716 respectively. When transferred learning on the small size datasets, the models for T-cell and lymphoid leukemia also give the promising performance by common sharing the optimal parameters. Compared with other five machine-learning methods, our CNN models reveal the superior performance. Finally, the leukemia-related pathogenesis analysis and distribution analysis on phosphorylated proteins along with K-means clustering analysis and position-specific conversation profiles on the phosphorylation site all indicate the strong practical feasibility of our easy-to-use CNN models.

Highlights

  • Post-translational modifications (PTMs) of proteins are a pivotal mechanism regulating cellular functions by the covalent and generally enzymatic modification, which plays vital roles in regulating various biological processes [1]

  • In order to understand an in-depth knowledge about genes associated with leukemia, we performed functional pathway enrichment analysis using Metascape database [33] and 1707 genes were found in all enriched pathways

  • Many researchers have focused on the cellular morphological changes in leukemia, indicating that cellular morphological changes play an important role in serving as biomarkers of leukemia [34,35]

Read more

Summary

Introduction

Post-translational modifications (PTMs) of proteins are a pivotal mechanism regulating cellular functions by the covalent and generally enzymatic modification, which plays vital roles in regulating various biological processes [1]. By covalently attaching phosphate moieties to Ser (S), Thr (T) or Tyr (Y) residues in a dynamic manner [3,4], it regulates many cellular processes such as DNA growth, metabolism and cell cycle control [5,6]. A number of phosphorylation sites have been accurately verified by different experimental techniques and related databases have been built, like Database of dbPSP 2.0 [7], PhosphoPep [8] and Phospho.ELM [9]. Traditional machinelearning models were developed by manually extracting effective features to represent phosphorylation site information, such as Shannon entropy, relative entropy, information gain, protein disordered property, the average cumulative hydrophobicity, etc. Traditional machinelearning models were developed by manually extracting effective features to represent phosphorylation site information, such as Shannon entropy, relative entropy, information gain, protein disordered property, the average cumulative hydrophobicity, etc. [10–12]

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call