Abstract

Glycosylation is one of the most abundant post-translational modifications (PTMs) required for various structure/function modulations of proteins in a living cell. Although elucidated recently in prokaryotes, this type of PTM is present across all three domains of life. In prokaryotes, two types of protein glycan linkages are more widespread namely, N- linked, where a glycan moiety is attached to the amide group of Asn, and O- linked, where a glycan moiety is attached to the hydroxyl group of Ser/Thr/Tyr. For their biologically ubiquitous nature, significance, and technology applications, the study of prokaryotic glycoproteins is a fast emerging area of research. Here we describe new Support Vector Machine (SVM) based algorithms (models) developed for predicting glycosylated-residues (glycosites) with high accuracy in prokaryotic protein sequences. The models are based on binary profile of patterns, composition profile of patterns, and position-specific scoring matrix profile of patterns as training features. The study employ an extensive dataset of 107 N-linked and 116 O-linked glycosites extracted from 59 experimentally characterized glycoproteins of prokaryotes. This dataset includes validated N-glycosites from phyla Crenarchaeota, Euryarchaeota (domain Archaea), Proteobacteria (domain Bacteria) and validated O-glycosites from phyla Actinobacteria, Bacteroidetes, Firmicutes and Proteobacteria (domain Bacteria). In view of the current understanding that glycosylation occurs on folded proteins in bacteria, hybrid models have been developed using information on predicted secondary structures and accessible surface area in various combinations with training features. Using these models, N-glycosites and O-glycosites could be predicted with an accuracy of 82.71% (MCC 0.65) and 73.71% (MCC 0.48), respectively. An evaluation of the best performing models with 28 independent prokaryotic glycoproteins confirms the suitability of these models in predicting N- and O-glycosites in potential glycoproteins from aforementioned organisms, with reasonably high confidence. A web server GlycoPP, implementing these models is available freely at http:/www.imtech.res.in/raghava/glycopp/.

Highlights

  • Glycosylation is a recently identified post-translational modification of proteins in prokaryotes: Archaea and Bacteria [1,2]

  • We have developed a number of Support Vector Machine (SVM) models using three types of features namely, binary profile of patterns (BPP), composition profile of patterns (CPP), and PSI-BLAST generated position specific scoring matrix (PSSM) profile of patterns (PPP) to recognize and differentiate glycosylated sequence contexts from non-glycosylated contexts in prokaryotic glycoproteins

  • We conclude that the methods that are trained using eukaryotic glycoprotein are not optimum for prediction of potential glycosites in bacterial and archaeal proteins

Read more

Summary

Introduction

Glycosylation is a recently identified post-translational modification of proteins in prokaryotes: Archaea and Bacteria [1,2]. Glycosylation is known to influence biological properties like activity, solubility, folding, conformation, stability, half-life, and/or immunogenicity of different cellular proteins thereby modulating the structure/function of these proteins for a variety of cellular/extracellular functions in a living cell [3,4,5] Owing to their involvement in host-pathogen interactions, immunogenicity and in many other important cellular functions, a number of bacterial and archaeal glycoproteins have been characterized experimentally [6,7,8,9,10]. A number of such algorithms have been developed to predict glycosites in eukaryotic glycoproteins using different tools of machine learning like Neural Network based (NetOglyc), [11,12] Support Vector Machine (SVM) based (NetNglyc), [13], Ensemble of SVMs (EnsembleGly), [14] and Random Forest based [15] All these existing tools are trained on eukaryotic glycoprotein sequences. In this study using a dataset of experimentally validated 107 N-linked and

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call