Abstract

Ubiquitylation is an important process of post-translational modification. Correct identification of protein lysine ubiquitylation sites is of fundamental importance to understand the molecular mechanism of lysine ubiquitylation in biological systems. This paper develops a novel computational method to effectively identify the lysine ubiquitylation sites based on the ensemble approach. In the proposed method, 468 ubiquitylation sites from 323 proteins retrieved from the Swiss-Prot database were encoded into feature vectors by using four kinds of protein sequences information. An effective feature selection method was then applied to extract informative feature subsets. After different feature subsets were obtained by setting different starting points in the search procedure, they were used to train multiple random forests classifiers and then aggregated into a consensus classifier by majority voting. Evaluated by jackknife tests and independent tests respectively, the accuracy of the proposed predictor reached 76.82% for the training dataset and 79.16% for the test dataset, indicating that this predictor is a useful tool to predict lysine ubiquitylation sites. Furthermore, site-specific feature analysis was performed and it was shown that ubiquitylation is intimately correlated with the features of its surrounding sites in addition to features derived from the lysine site itself. The feature selection method is available upon request.

Highlights

  • Ubiquitylation is a universal and important post-translational modification where ubiquitin is linked to some lysine residues of target proteins [1,2,3], and forms an isopeptide bond between the ε-amino groups of lysine residues of a substrate protein and the C-terminal double-glycine carboxy groups of ubiquitin protein [4,5]

  • Since identification of protein lysine ubiquitylation sites is of fundamental importance to understand the molecular mechanism of lysine ubiquitylation in biological systems, many post-genome era researchers have focused on this field [9,10,11]

  • We evaluate the prediction performance of the ensemble classifier on the training set constructed in this study which consists of 298 ubiquitylation sites and 563 non-ubiquitylation sites

Read more

Summary

Introduction

Ubiquitylation is a universal and important post-translational modification where ubiquitin is linked to some lysine residues of target proteins [1,2,3], and forms an isopeptide bond between the ε-amino groups of lysine residues of a substrate protein and the C-terminal double-glycine carboxy groups of ubiquitin protein [4,5]. Since identification of protein lysine ubiquitylation sites is of fundamental importance to understand the molecular mechanism of lysine ubiquitylation in biological systems, many post-genome era researchers have focused on this field [9,10,11]. Some high-throughput experimental technologies have been developed to analyze and model the lysine ubiquitylation process at a genomic scale, such as proteolytic digestion, three steps affinity purification, and analysis using mass spectrometry [12]. These conventional experiment approaches are labor-intensive and time-consuming, especially for large-scale data sets. Tung and Ho built a prediction model, UbiPred, by using an informative subset of 531 physicochemical properties and a support vector machine.

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.