Post-translational Modification Site Prediction Research Articles

As a newly discovered post-translational modification (PTM), lysine malonylation (Kmal) regulates a myriad of cellular processes from prokaryotes to eukaryotes and has important implications in human diseases. Despite its functional significance, computational methods to accurately identify malonylation sites are still lacking and urgently needed. In particular, there is currently no comprehensive analysis and assessment of different features and machine learning (ML) methods that are required for constructing the necessary prediction models. Here, we review, analyze and compare 11 different feature encoding methods, with the goal of extracting key patterns and characteristics from residue sequences of Kmal sites. We identify optimized feature sets, with which four commonly used ML methods (random forest, support vector machines, K-nearest neighbor and logistic regression) and one recently proposed [Light Gradient Boosting Machine (LightGBM)] are trained on data from three species, namely, Escherichia coli, Mus musculus and Homo sapiens, and compared using randomized 10-fold cross-validation tests. We show that integration of the single method-based models through ensemble learning further improves the prediction performance and model robustness on the independent test. When compared to the existing state-of-the-art predictor, MaloPred, the optimal ensemble models were more accurate for all three species (AUC: 0.930, 0.923 and 0.944 for E. coli, M. musculus and H. sapiens, respectively). Using the ensemble models, we developed an accessible online predictor, kmal-sp, available at http://kmalsp.erc.monash.edu/. We hope that this comprehensive survey and the proposed strategy for building more accurate models can serve as a useful guide for inspiring future developments of computational methods for PTM site prediction, expedite the discovery of new malonylation and other PTM types and facilitate hypothesis-driven experimental validation of novel malonylated substrates and malonylation sites.

BackgroundRNA-protein interaction plays an essential role in several biological processes, such as protein synthesis, gene expression, posttranscriptional regulation and viral infectivity. Identification of RNA-binding sites in proteins provides valuable insights for biologists. However, experimental determination of RNA-protein interaction remains time-consuming and labor-intensive. Thus, computational approaches for prediction of RNA-binding sites in proteins have become highly desirable. Extensive studies of RNA-binding site prediction have led to the development of several methods. However, they could yield low sensitivities in trade-off for high specificities.ResultsWe propose a method, RNAProB, which incorporates a new smoothed position-specific scoring matrix (PSSM) encoding scheme with a support vector machine model to predict RNA-binding sites in proteins. Besides the incorporation of evolutionary information from standard PSSM profiles, the proposed smoothed PSSM encoding scheme also considers the correlation and dependency from the neighboring residues for each amino acid in a protein. Experimental results show that smoothed PSSM encoding significantly enhances the prediction performance, especially for sensitivity. Using five-fold cross-validation, our method performs better than the state-of-the-art systems by 4.90%~6.83%, 0.88%~5.33%, and 0.10~0.23 in terms of overall accuracy, specificity, and Matthew's correlation coefficient, respectively. Most notably, compared to other approaches, RNAProB significantly improves sensitivity by 7.0%~26.9% over the benchmark data sets. To prevent data over fitting, a three-way data split procedure is incorporated to estimate the prediction performance. Moreover, physicochemical properties and amino acid preferences of RNA-binding proteins are examined and analyzed.ConclusionOur results demonstrate that smoothed PSSM encoding scheme significantly enhances the performance of RNA-binding site prediction in proteins. This also supports our assumption that smoothed PSSM encoding can better resolve the ambiguity of discriminating between interacting and non-interacting residues by modelling the dependency from surrounding residues. The proposed method can be used in other research areas, such as DNA-binding site prediction, protein-protein interaction, and prediction of posttranslational modification sites.

Post-translational Modification Site Prediction Research Articles

Related Topics

Articles published on Post-translational Modification Site Prediction

Semi-ssPTM: A Web Server for Species-Specific Lysine Post-Translational Modification Site Prediction by Semi-Supervised Domain Adaptation

Deep Learning-Based Advances In Protein Posttranslational Modification Site and Protein Cleavage Prediction.

SSMFN: a fused spatial and sequential deep learning model for methylation site prediction.

DeepTL-Ubi: A novel deep transfer learning method for effectively predicting ubiquitination sites of multiple species

MusiteDeep: a deep-learning based webserver for protein post-translational modification site prediction and visualization.

SulSite-GTB: identification of protein S-sulfenylation sites by fusing multiple feature information and gradient tree boosting

Evaluation of nonsynonymous single nucleotide variations in NOS2 Gene identified through whole exome sequencing: A bioinformatics approach

Capsule network for protein post-translational modification site prediction.

Computational analysis and prediction of lysine malonylation sites by exploiting informative features in an integrative machine-learning framework.

MLysPTMpred: Multiple Lysine PTM Site Prediction Using Combination of SVM with Resolving Data Imbalance Issue

New Achievements in Bioinformatics Prediction of Post Translational Modification of Proteins.

Prediction of post-translational modification sites using multiple kernel support vector machine.

ESA-UbiSite: accurate prediction of human ubiquitination sites by identifying a set of effective negatives.

DAPPLE 2: a Tool for the Homology-Based Prediction of Post-Translational Modification Sites.

A homology-based pipeline for global prediction of post-translational modification sites.

Prediction of posttranslational modification sites from amino acid sequences with kernel methods

Retracted: Prediction of posttranslational modification sites from sequences with kernel methods

Predicting RNA-binding sites of proteins using support vector machines and evolutionary information

Intracellular Peptides as Natural Regulators of Cell Signaling

Homology of lysosomal enzymes and related proteins: Prediction of posttranslational modification sites including phosphorylation of mannose and potential epitopic and substrate binding sites in the α‐ and β‐subunits of hexosaminidases, α‐glucosidase, and rabbit and human isomaltase

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Post-translational Modification Site Prediction Research Articles

Related Topics

Articles published on Post-translational Modification Site Prediction

Semi-ssPTM: A Web Server for Species-Specific Lysine Post-Translational Modification Site Prediction by Semi-Supervised Domain Adaptation

Deep Learning-Based Advances In Protein Posttranslational Modification Site and Protein Cleavage Prediction.

SSMFN: a fused spatial and sequential deep learning model for methylation site prediction.

DeepTL-Ubi: A novel deep transfer learning method for effectively predicting ubiquitination sites of multiple species

MusiteDeep: a deep-learning based webserver for protein post-translational modification site prediction and visualization.

SulSite-GTB: identification of protein S-sulfenylation sites by fusing multiple feature information and gradient tree boosting

Evaluation of nonsynonymous single nucleotide variations in NOS2 Gene identified through whole exome sequencing: A bioinformatics approach

Capsule network for protein post-translational modification site prediction.

Computational analysis and prediction of lysine malonylation sites by exploiting informative features in an integrative machine-learning framework.

MLysPTMpred: Multiple Lysine PTM Site Prediction Using Combination of SVM with Resolving Data Imbalance Issue

New Achievements in Bioinformatics Prediction of Post Translational Modification of Proteins.

Prediction of post-translational modification sites using multiple kernel support vector machine.

ESA-UbiSite: accurate prediction of human ubiquitination sites by identifying a set of effective negatives.

DAPPLE 2: a Tool for the Homology-Based Prediction of Post-Translational Modification Sites.

A homology-based pipeline for global prediction of post-translational modification sites.

Prediction of posttranslational modification sites from amino acid sequences with kernel methods

Retracted: Prediction of posttranslational modification sites from sequences with kernel methods

Predicting RNA-binding sites of proteins using support vector machines and evolutionary information

Intracellular Peptides as Natural Regulators of Cell Signaling

Homology of lysosomal enzymes and related proteins: Prediction of posttranslational modification sites including phosphorylation of mannose and potential epitopic and substrate binding sites in the α‐ and β‐subunits of hexosaminidases, α‐glucosidase, and rabbit and human isomaltase