Abstract

Succinylation is an important posttranslational modification of proteins, which plays a key role in protein conformation regulation and cellular function control. Many studies have shown that succinylation modification on protein lysine residue is closely related to the occurrence of many diseases. To understand the mechanism of succinylation profoundly, it is necessary to identify succinylation sites in proteins accurately. In this study, we develop a new model, IFS-LightGBM (BO), which utilizes the incremental feature selection (IFS) method, the LightGBM feature selection method, the Bayesian optimization algorithm, and the LightGBM classifier, to predict succinylation sites in proteins. Specifically, pseudo amino acid composition (PseAAC), position-specific scoring matrix (PSSM), disorder status, and Composition of k-spaced Amino Acid Pairs (CKSAAP) are firstly employed to extract feature information. Then, utilizing the combination of the LightGBM feature selection method and the incremental feature selection (IFS) method selects the optimal feature subset for the LightGBM classifier. Finally, to increase prediction accuracy and reduce the computation load, the Bayesian optimization algorithm is used to optimize the parameters of the LightGBM classifier. The results reveal that the IFS-LightGBM (BO)-based prediction model performs better when it is evaluated by some common metrics, such as accuracy, recall, precision, Matthews Correlation Coefficient (MCC), and F-measure.

Highlights

  • Posttranslational modification (PTM) is the chemical modification of the precursor protein after translation, such as the addition of a small molecule protein or the introduction of a functional group, so that the inactive precursor protein can obtain biological functions

  • Composition of k-spaced Amino Acid Pairs (CKSAAP) characterizes the short sequence motif information of polypeptide segments, position-specific scoring matrix (PSSM) reflects evolutionary information, disorder features reflect natively disordered residues recognized by VSL2B [35], and pseudo amino acid composition (PseAAC) reflects the physicochemical information of polypeptide segments

  • We propose a model incremental feature selection (IFS)-LightGBM (BO) based on machine learning for the prediction of succinylation sites

Read more

Summary

Introduction

Posttranslational modification (PTM) is the chemical modification of the precursor protein after translation, such as the addition of a small molecule protein or the introduction of a functional group, so that the inactive precursor protein can obtain biological functions. There are many forms of posttranslational modifications of proteins, such as ubiquitination, glutarylation, sumoylation, palmitoylation, acetylation, and methylation. Succinylation is PTM that occurs on lysine. Succinylation is a broadly conserved protein posttranslational modification that exists in prokaryotic and eukaryotic cells and can coordinate various biological processes [2,3,4]. Compared with the methylation and acetylation that occur on lysine, succinylation will cause more substantial changes in the chemical structure of lysine [5]. In a variety of cell functions, including metabolism and epigenetic regulation, succinylated proteins are involved

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call