Abstract

BackgroundGlutarylation, the addition of a glutaryl group (five carbons) to a lysine residue of a protein molecule, is an important post-translational modification and plays a regulatory role in a variety of physiological and biological processes. As the number of experimentally identified glutarylated peptides increases, it becomes imperative to investigate substrate motifs to enhance the study of protein glutarylation. We carried out a bioinformatics investigation of glutarylation sites based on amino acid composition using a public database containing information on 430 non-homologous glutarylation sites.ResultsThe TwoSampleLogo analysis indicates that positively charged and polar amino acids surrounding glutarylated sites may be associated with the specificity in substrate site of protein glutarylation. Additionally, the chi-squared test was utilized to explore the intrinsic interdependence between two positions around glutarylation sites. Further, maximal dependence decomposition (MDD), which consists of partitioning a large-scale dataset into subgroups with statistically significant amino acid conservation, was used to capture motif signatures of glutarylation sites. We considered single features, such as amino acid composition (AAC), amino acid pair composition (AAPC), and composition of k-spaced amino acid pairs (CKSAAP), as well as the effectiveness of incorporating MDD-identified substrate motifs into an integrated prediction model. Evaluation by five-fold cross-validation showed that AAC was most effective in discriminating between glutarylation and non-glutarylation sites, according to support vector machine (SVM).ConclusionsThe SVM model integrating MDD-identified substrate motifs performed well, with a sensitivity of 0.677, a specificity of 0.619, an accuracy of 0.638, and a Matthews Correlation Coefficient (MCC) value of 0.28. Using an independent testing dataset (46 glutarylated and 92 non-glutarylated sites) obtained from the literature, we demonstrated that the integrated SVM model could improve the predictive performance effectively, yielding a balanced sensitivity and specificity of 0.652 and 0.739, respectively. This integrated SVM model has been implemented as a web-based system (MDDGlutar), which is now freely available at http://csb.cse.yzu.edu.tw/MDDGlutar/.

Highlights

  • Glutarylation, the addition of a glutaryl group to a lysine residue of a protein molecule, is an important post-translational modification and plays a regulatory role in a variety of physiological and biological processes

  • If the lysine residue has not been annotated as a glutarylation site, the fragments were regarded as the negative dataset

  • Investigation and encoding of sequence-based features The research in this study focused on the analysis of sequence-based features including amino acid composition (AAC), amino acid pair composition (AAPC), and Sequence identity cut-off

Read more

Summary

Introduction

Glutarylation, the addition of a glutaryl group (five carbons) to a lysine residue of a protein molecule, is an important post-translational modification and plays a regulatory role in a variety of physiological and biological processes. Crucial protein PTM occurring at the ε-amino groups of specific lysine residues (K) have proven to be hallmarks of active chromatin, and major regulators of gene expression, protein-protein interactions, and protein processing and degradation. These include 2-hydroxyisobutyrylation [2], acetylation [3], butyrylation [4], crotonylation [5], malonylation [6], propionylation [7] and succinylation [8]. Malonylation, succinylation, and glutarylation, which are collectively referred to as lysine acylation, are highly dynamic PTMs and appear conserved across evolutionarily related species [3]

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call