N-GlycoGo: Predicting Protein N-Glycosylation Sites on Imbalanced Data Sets by Using Heterogeneous and Comprehensive Strategy

Ching-Hsuan Chien,Chi-Wei Chen,Yen-Wei Chu,Shih-Huan Lin,Zong-Han Chang,Chi-Chang Chang

doi:10.1109/access.2020.3022629

Abstract

Glycosylation is the most complex post-modification effect of proteins. It participates in many biological processes in the human body and is closely related to many disease states. Among them, N-linked glycosylation is the most contained glycosylation data. However, the current N-linked glycosylation prediction tool does not take into account the serious imbalance between positive and negative data. In this study, we used protein sequence and amino acid characteristics to construct an N-linked glycosylation prediction model called N-GlycoGo. Based on sequence, structure, and function, 11 heterogeneous features were encoded. Further, XGBoost was selected for modeling. Finally, independent testing of human and mouse prediction models showed that N-GlycoGo is superior to other tools with Matthews correlation coefficient (MCC) values of 0.397 and 0.719, respectively, which is higher than other glycosylation site prediction tools. We have developed a fast and accurate prediction tool, N-GlycoGo, for N-linked glycosylation. N-GlycoGo is available at http://ncblab.nchu.edu.tw/n-glycogo/.

Highlights

Glycosylation is the most complex and common post-translational modification and involves the enzymatic attachment of sugars to proteins
The main purpose of the ensemble method is to improve the performance of a single classifier
N-GlycoGo is based on the ensemble learning model

Summary

Introduction

Glycosylation is the most complex and common post-translational modification and involves the enzymatic attachment of sugars to proteins. The construction of these models requires computing tools and biological experimental methods and parameter adjustment training and repeated experiments require considerable time, especially for the mechanistic kinetic models [7]. To control and predict glycosylation, various genetic or cell culture methods of modification [3] and dynamics [4], genetic engineering [5], and genome models [6] have been used. These technologies have high accuracy, the instrument is expensive. Several prediction tools use amino acid sequences to predict post-translational modification sites

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 12	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

N-GlycoGo: Predicting Protein N-Glycosylation Sites on Imbalanced Data Sets by Using Heterogeneous and Comprehensive Strategy

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Breast Tumor Detection and Classification in Mammogram Images Using Modified YOLOv5 Network.
Aqsa Mohiyuddin ... Usman Ghani
Computational and Mathematical Methods in Medicine | VOL. 2022
Aqsa Mohiyuddin, et. al.Aqsa Mohiyuddin ... Usman Ghani
04 Jan 2022
Computational and Mathematical Methods in Medicine | VOL. 2022

MDD-Palm: Identification of protein S-palmitoylation sites with substrate motifs based on maximal dependence decomposition.
Shun-Long Weng ... Chien-Hsun Huang
PLOS ONE | VOL. 12
Shun-Long Weng, et. al.Shun-Long Weng ... Chien-Hsun Huang
29 Jun 2017
PLOS ONE | VOL. 12

Prediction of Citrullination Sites on the Basis of mRMR Method and SNN.
Min Liu ... Guangzhong Liu
Combinatorial chemistry & high throughput screening | VOL. 22
Min Liu, et. al.Min Liu ... Guangzhong Liu
16 Jan 2020
Combinatorial chemistry & high throughput screening | VOL. 22

Novel Application of Near-infrared Spectroscopy and Chemometrics Approach for Detection of Lime Juice Adulteration.
Reza Jahani ... Jamshid Salamzadeh
Iranian Journal of Pharmaceutical Research : IJPR | VOL. 19
Reza Jahani, et. al.Reza Jahani ... Jamshid Salamzadeh
01 Jan 2020
Iranian Journal of Pharmaceutical Research : IJPR | VOL. 19

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

N-GlycoGo: Predicting Protein N-Glycosylation Sites on Imbalanced Data Sets by Using Heterogeneous and Comprehensive Strategy

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access