Abstract

O-glycosylation of mammalian proteins is one of the important posttranslational modifications. We applied a support vector machine (SVM) to predict whether Ser or Thr is glycosylated, in order to elucidate the O-glycosylation mechanism. O-glycosylated sites were often found clustered along the sequence, whereas other sites were located sporadically. Therefore, we developed two types of SVMs for predicting clustered and isolated sites separately. We found that the amino acid composition was effective for predicting the clustered type, whereas the site-specific algorithm was effective for the isolated type. The highest prediction accuracy for the clustered type was 74%, while that for the isolated type was 79%. The existence frequency of amino acids around the O-glycosylation sites was different in the two types: namely, Pro, Val and Ala had high existence probabilities at each specific position relative to a glycosylation site, especially for the isolated type. Independent component analyses for the amino acid sequences around O-glycosylation sites showed the position-specific existences of the identified amino acids as independent components. The O-glycosylation sites were preferentially located within intrinsically disordered regions of extracellular proteins: particularly, more than 90% of the clustered O-GalNAc glycosylation sites were observed in intrinsically disordered regions. This feature could be the key for understanding the non-conservation property of O-glycosylation, and its role in functional diversity and structural stability.

Highlights

  • Glycan, a carbohydrate chain, is considered the third life chain after DNA and protein [1]

  • Two types of information were used: one was the amino acid sequence encoded by sparse coding, which distinguished all 20 types of amino acids, while the other was the amino acid composition of the sequence

  • [23], O-GalNAc glycosylation sites were predicted by using a layered neural network; this study indicated that bulk average properties including amino acid composition give the best prediction

Read more

Summary

Introduction

A carbohydrate chain, is considered the third life chain after DNA and protein [1]. The two major types of protein glycosylation in eukaryotes are N-linked and O-linked glycosylation. N-linked glycans are attached to the amide nitrogens of asparagine (Asn) side chains in the consensus sequences Asn-Xaa-Ser or Asn-Xaa-Thr, where Xaa represents any amino acid residue except proline (Pro) [15,16]. O-linked glycans are attached to the hydroxyl group of serine (Ser) or threonine (Thr) side chains [17]. O-linked glycosylation (O-glycosylation) encompasses several different types of glycosylation, such as O-GalNAc, O-GlcNAc, O-Fuc, O-Glc, O-Man, and O-Xyl glycosylation. The most common O-glycosylation is O-GalNAc glycosylation, or mucin-type

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.