Abstract

BackgroundMalonylation is a recently discovered post-translational modification that is associated with a variety of diseases such as Type 2 Diabetes Mellitus and different types of cancers. Compared with experimental identification of malonylation sites, computational method is a time-effective process with comparatively low costs.ResultsIn this study, we proposed a novel computational model called Mal-Prec (Malonylation Prediction) for malonylation site prediction through the combination of Principal Component Analysis and Support Vector Machine. One-hot encoding, physio-chemical properties, and composition of k-spaced acid pairs were initially performed to extract sequence features. PCA was then applied to select optimal feature subsets while SVM was adopted to predict malonylation sites. Five-fold cross-validation results showed that Mal-Prec can achieve better prediction performance compared with other approaches. AUC (area under the receiver operating characteristic curves) analysis achieved 96.47 and 90.72% on 5-fold cross-validation of independent data sets, respectively.ConclusionMal-Prec is a computationally reliable method for identifying malonylation sites in protein sequences. It outperforms existing prediction tools and can serve as a useful tool for identifying and discovering novel malonylation sites in human proteins. Mal-Prec is coded in MATLAB and is publicly available at https://github.com/flyinsky6/Mal-Prec, together with the data sets used in this study.

Highlights

  • Malonylation is a recently discovered post-translational modification that is associated with a variety of diseases such as Type 2 Diabetes Mellitus and different types of cancers

  • Determination of composition of k-spaced amino acid pairs (CKSAAP) features Though many approaches have adopted CKSAAP features to predict Post-translational modification (PTM) sites, most of them only used the CKSAAP features generated by single K value and did not identify optimal K for constructing the CKSAAP feature

  • The best prediction performance was achieved when using Principal Component Analysis (PCA) to reduce the dimensionality of feature combination (CKSAAP, amino acid index properties (AAindex), and one-hot) to 100, rather than combined those features all together

Read more

Summary

Introduction

Malonylation is a recently discovered post-translational modification that is associated with a variety of diseases such as Type 2 Diabetes Mellitus and different types of cancers. Wang et al [10] built a predictor called MaloPred, which took into accounts of five features including amino acid compositions (AAC), amino acids binary encoding (BINA), encoding based on grouped weight (EBGW), K nearest neighbors feature (KNN), and position specific scoring matrix (PSSM). Their information gains (IG) were evaluated to select most meaningful and significant features. Hasan and Kurata [11] proposed a prediction tool called identification of Lysine-Malonylation Sites (iLMS), which used the composition of profile-based k-Spaced Amino Acid Pairs (pkSAAP), dipeptide amino acid compositions (DC) and amino acid index properties (AAindex) to encode the segment. Many achievements have been made in the prediction of malonyl acylation modification sites, there is still much room for improvement in the prediction performance

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call