Detecting Succinylation sites from protein sequences using ensemble support vector machine

Qiao Ning,Lingling Bao,Zhiqiang Ma,Xiaowei Zhao,Xiaosa Zhao

doi:10.1186/s12859-018-2249-4

Qiao Ning, Lingling Bao + Show 3 more

Open Access

https://doi.org/10.1186/s12859-018-2249-4

Copy DOI

Journal: BMC bioinformatics	Publication Date: Jun 25, 2018
Citations: 37	License type: open-access

Affiliation: Northeast Normal University

Abstract

BackgroundLysine succinylation is a new kind of post-translational modification which plays a key role in protein conformation regulation and cellular function control. To understand the mechanism of succinylation profoundly, it is necessary to identify succinylation sites in proteins accurately. However, traditional methods, experimental approaches, are labor-intensive and time-consuming. Computational prediction methods have been proposed recent years, and they are popular because of their convenience and high speed. In this study, we developed a new method to predict succinylation sites in protein combining multiple features, including amino acid composition, binary encoding, physicochemical property and grey pseudo amino acid composition, with a feature selection scheme (information gain). And then, it was trained using SVM (Support Vector Machine) and an ensemble learning algorithm.ResultsThe performance of this method was measured with an accuracy of 89.14% and a MCC (Matthew Correlation Coefficient) of 0.79 using 10-fold cross validation on training dataset and an accuracy of 84.5% and a MCC of 0.2 on independent dataset.ConclusionsThe conclusions made from this study can help to understand more of the succinylation mechanism. These results suggest that our method was very promising for predicting succinylation sites. The source code and data of this paper are freely available athttps://github.com/ningq669/PSuccE.

Highlights

Lysine succinylation is a new kind of post-translational modification which plays a key role in protein conformation regulation and cellular function control
To solve problems mentioned above, we developed a new predictor, which was proposed to predict succinylation sites in protein using the same data set with SuccinSite
As demonstrated in compliance with Chou’s 5-step rule [38] in a series of recent publications [6,7,8,9,10,11,12], we should follow the following five guidelines to establish a useful sequence-based predictor for a biological system: (a) select or construct a valid benchmark data set to train and test the predictor; (b) formulate these protein sequence samples with an effective mathematical expression that can truly reflect their intrinsic correlation with the target to be predicted; (c) introduce or develop a powerful algorithm to operate the prediction; (d) properly perform cross-validation tests to objectively evaluate the anticipated accuracy of the predictor; (e) establish a user-friendly web-server for the predictor that is accessible to the public