Glycosylation site prediction using ensembles of Support Vector Machine classifiers.

Cornelia Caragea,Drena Dobbs,Vasant Honavar,Adrian Silvescu,Jivko Sinapov

doi:10.1186/1471-2105-8-438

Cornelia Caragea, Drena Dobbs + Show 3 more

Open Access

https://doi.org/10.1186/1471-2105-8-438

Copy DOI

Journal: BMC Bioinformatics	Publication Date: Nov 9, 2007
Citations: 189	License type: CC BY 2.0

Affiliation: Iowa State University

Abstract

BackgroundGlycosylation is one of the most complex post-translational modifications (PTMs) of proteins in eukaryotic cells. Glycosylation plays an important role in biological processes ranging from protein folding and subcellular localization, to ligand recognition and cell-cell interactions. Experimental identification of glycosylation sites is expensive and laborious. Hence, there is significant interest in the development of computational methods for reliable prediction of glycosylation sites from amino acid sequences.ResultsWe explore machine learning methods for training classifiers to predict the amino acid residues that are likely to be glycosylated using information derived from the target amino acid residue and its sequence neighbors. We compare the performance of Support Vector Machine classifiers and ensembles of Support Vector Machine classifiers trained on a dataset of experimentally determined N-linked, O-linked, and C-linked glycosylation sites extracted from O-GlycBase version 6.00, a database of 242 proteins from several different species. The results of our experiments show that the ensembles of Support Vector Machine classifiers outperform single Support Vector Machine classifiers on the problem of predicting glycosylation sites in terms of a range of standard measures for comparing the performance of classifiers. The resulting methods have been implemented in EnsembleGly, a web server for glycosylation site prediction.ConclusionEnsembles of Support Vector Machine classifiers offer an accurate and reliable approach to automated identification of putative glycosylation sites in glycoprotein sequences.

Highlights

Glycosylation is one of the most complex post-translational modifications (PTMs) of proteins in eukaryotic cells
An ensemble of Support Vector Machines outperforms a single Support Vector Machine trained on unbalanced data on the glycosylation site prediction task For each glycosylation type considered in this study, N, O, and C-linked glycosylation, we trained ensembles of Support Vector Machine (SVM) classifiers to predict whether or not a site in a protein sequence is a glycosylation site
An ensemble of Support Vector Machines outperforms a single Support Vector Machine trained on balanced data on the glycosylation site prediction task For each glycosylation type considered in this study, N, O, and C-linked glycosylation, we compared the performance of the ensemble of SVM classifiers with that of a single SVM classifier trained on a balanced training set and evaluated on a test set

Summary

Introduction

Glycosylation is one of the most complex post-translational modifications (PTMs) of proteins in eukaryotic cells. Glycosylation plays an important role in biological processes ranging from protein folding and subcellular localization, to ligand recognition and cell-cell interactions. There is significant interest in the development of computational methods for reliable prediction of glycosylation sites from amino acid sequences. Glycosylation is one of the most complex and ubiquitous post-translational modifications (PTMs) of proteins in eukaryotic cells. It is a dynamic enzymatic process in which saccharides are attached to proteins or lipoproteins, usually on serine (S), threonine (T), asparagine (N), and tryptophan (W) residues. O-GlycBase [16] provides such a dataset for training classifiers for predicting glycosylation sites

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Glycosylation site prediction using ensembles of Support Vector Machine classifiers.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Identification of N-Linked Glycosylation Sites Using Glycoprotein Digestion with Pronase Prior to MALDI Tandem Time-of-Flight Mass Spectrometry
Ying Qing Yu ... Martin Gilar
Analytical Chemistry | VOL. 79
Ying Qing Yu, et. al.Ying Qing Yu ... Martin Gilar
23 Jan 2007
Analytical Chemistry | VOL. 79

Lysine acetylation sites prediction using an ensemble of support vector machine classifiers
Yan Xu ... Nai-Yang Deng
Journal of Theoretical Biology | VOL. 264
Yan Xu, et. al.Yan Xu ... Nai-Yang Deng
18 Jan 2010
Journal of Theoretical Biology | VOL. 264

Currency crisis indication by using ensembles of support vector machine classifiers
Nor Azuana Ramli ... Hooy Chee Wooi
-
Nor Azuana Ramli, et. al.Nor Azuana Ramli ... Hooy Chee Wooi
01 Jan 2014
01 Jan 2014

Prediction of N-linked glycosylation sites using position relative features and statistical moments.
Muhammad Aizaz Akmal ... Nouman Rasool
PLOS ONE | VOL. 12
Muhammad Aizaz Akmal, et. al.Muhammad Aizaz Akmal ... Nouman Rasool
10 Aug 2017
PLOS ONE | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Glycosylation site prediction using ensembles of Support Vector Machine classifiers.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics