Abstract

Protein N-linked glycosylation is a post-translational modification that plays an important role in a myriad of biological processes. Computational prediction approaches serve as complementary methods for the characterization of glycosylation sites. Most of the existing predictors for N-linked glycosylation utilize the information that the glycosylation site occurs at the N-X-[S/T] sequon, where X is any amino acid except proline. Not all N-X-[S/T] sequons are glycosylated, thus the N-X-[S/T] sequon is a necessary but not sufficient determinant for protein glycosylation. In that regard, computational prediction of N-linked glycosylation sites confined to N-X-[S/T] sequons is an important problem. Here, we report DeepNGlyPred a deep learning-based approach that encodes the positive and negative sequences in the human proteome dataset (extracted from N-GlycositeAtlas) using sequence-based features (gapped-dipeptide), predicted structural features, and evolutionary information. DeepNGlyPred produces SN, SP, MCC, and ACC of 88.62%, 73.92%, 0.60, and 79.41%, respectively on N-GlyDE independent test set, which is better than the compared approaches. These results demonstrate that DeepNGlyPred is a robust computational technique to predict N-Linked glycosylation sites confined to N-X-[S/T] sequon. DeepNGlyPred will be a useful resource for the glycobiology community.

Highlights

  • Protein N-linked glycosylation is one of the most important post-translational modifications (PTM) that play essential roles in many vital biological processes like protein folding, protein stability, cell adhesion, molecular trafficking and clearance, receptor binding and activation, signal transduction, immune response, antigenicity, and apoptosis [1,2,3,4,5,6,7,8,9,10,11] in eukaryotes, archaea, and Gram-negative bacteria

  • We developed two flavors of DeepNGlyPred based on the N-GlyDE dataset and N-GlycositeAtlas dataset

  • DeepNGlyPred uses sequence-based and structural-based features generated from NetSurfP-2.0, PSI-BLAST, and Gapped Dipeptide

Read more

Summary

Introduction

Protein N-linked glycosylation is one of the most important post-translational modifications (PTM) that play essential roles in many vital biological processes like protein folding, protein stability, cell adhesion, molecular trafficking and clearance, receptor binding and activation, signal transduction, immune response, antigenicity, and apoptosis [1,2,3,4,5,6,7,8,9,10,11] in eukaryotes, archaea, and Gram-negative bacteria. The presence of such sequon in the peptide does not sufficiently confirm that it is N-linked glycosylated because about one-third to half sequons are buried deep inside the proteins that are not accessible to glycosylation enzymes [17,18,19,20]. The presence of sequon is necessary but not sufficient for N-linked glycosylation in both prokaryotes and eukaryotes [6,17,20,21]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call