Abstract
Protein O-GlcNAcylation, involving the β-attachment of single N-acetylglucosamine (GlcNAc) to the hydroxyl group of serine or threonine residues, is an O-linked glycosylation catalyzed by O-GlcNAc transferase (OGT). Molecular level investigation of the basis for OGT's substrate specificity should aid understanding how O-GlcNAc contributes to diverse cellular processes. Due to an increasing number of O-GlcNAcylated peptides with site-specific information identified by mass spectrometry (MS)-based proteomics, we were motivated to characterize substrate site motifs of O-GlcNAc transferases. In this investigation, a non-redundant dataset of 410 experimentally verified O-GlcNAcylation sites were manually extracted from dbOGAP, OGlycBase and UniProtKB. After detection of conserved motifs by using maximal dependence decomposition, profile hidden Markov model (profile HMM) was adopted to learn a first-layered model for each identified OGT substrate motif. Support Vector Machine (SVM) was then used to generate a second-layered model learned from the output values of profile HMMs in first layer. The two-layered predictive model was evaluated using a five-fold cross validation which yielded a sensitivity of 85.4%, a specificity of 84.1%, and an accuracy of 84.7%. Additionally, an independent testing set from PhosphoSitePlus, which was really non-homologous to the training data of predictive model, was used to demonstrate that the proposed method could provide a promising accuracy (84.05%) and outperform other O-GlcNAcylation site prediction tools. A case study indicated that the proposed method could be a feasible means of conducting preliminary analyses of protein O-GlcNAcylation and has been implemented as a web-based system, OGTSite, which is now freely available at http://csb.cse.yzu.edu.tw/OGTSite/.
Highlights
A type of O-linked glycosylation, Protein O-GlcNAcylation (O-GlcNAc), attaches a single N-acetylglucosamine (GlcNAc) to serine (Ser)/threonine (Thr) residues [1]
Amino acids composition of O-GlcNAcylation sites This study aims to investigate the O-GlcNAc transferase (OGT) substrate motifs based on the amino acid composition surrounding O-GlcNAcylation sites
Another featured characteristic is the depletion of P and L at +1 and +2, respectively, which is immediately adjacent to the O-GlcNAcylation sites
Summary
A type of O-linked glycosylation, Protein O-GlcNAcylation (O-GlcNAc), attaches a single N-acetylglucosamine (GlcNAc) to serine (Ser)/threonine (Thr) residues [1]. Due to an interest to better identify O-GlcNAcylation sites and reduce experimental efforts, computational prediction of site motifs and O-GlcNAcylation sites have been considered. Gupta and Brunak have developed YinOYang an O-GlcNAcylation prediction tool trained using 40 O-GlcNAcylation sites [20]. Chen et al have developed a similar tool incorporating structural topology to identify O-glycosylation sites on transmembrane proteins [21]. The increase in experimentally identified O-GlcNAcylation sites motivates new developments including OGlcNAcScan, which was trained using 373 O-GlcNAcylation sites [22]. In the midst of these developments, Carage et al have demonstrated that ensembles of support vector machine (SVM) classifiers could outperform single SVM classifier in terms of predicting protein glycosylation sites [24]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.