Abstract

Glycosylation is the most complex post-modification effect of proteins. It participates in many biological processes in the human body and is closely related to many disease states. Among them, N-linked glycosylation is the most contained glycosylation data. However, the current N-linked glycosylation prediction tool does not take into account the serious imbalance between positive and negative data. In this study, we used protein sequence and amino acid characteristics to construct an N-linked glycosylation prediction model called N-GlycoGo. Based on sequence, structure, and function, 11 heterogeneous features were encoded. Further, XGBoost was selected for modeling. Finally, independent testing of human and mouse prediction models showed that N-GlycoGo is superior to other tools with Matthews correlation coefficient (MCC) values of 0.397 and 0.719, respectively, which is higher than other glycosylation site prediction tools. We have developed a fast and accurate prediction tool, N-GlycoGo, for N-linked glycosylation. N-GlycoGo is available at http://ncblab.nchu.edu.tw/n-glycogo/.

Highlights

  • Glycosylation is the most complex and common post-translational modification and involves the enzymatic attachment of sugars to proteins

  • The main purpose of the ensemble method is to improve the performance of a single classifier

  • N-GlycoGo is based on the ensemble learning model

Read more

Summary

Introduction

Glycosylation is the most complex and common post-translational modification and involves the enzymatic attachment of sugars to proteins. The construction of these models requires computing tools and biological experimental methods and parameter adjustment training and repeated experiments require considerable time, especially for the mechanistic kinetic models [7]. To control and predict glycosylation, various genetic or cell culture methods of modification [3] and dynamics [4], genetic engineering [5], and genome models [6] have been used. These technologies have high accuracy, the instrument is expensive. Several prediction tools use amino acid sequences to predict post-translational modification sites

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.