Abstract

Carbohydrate-binding proteins are proteins that can interact with sugar chains but do not modify them. They are involved in many physiological functions, and we have developed a method for predicting them from their amino acid sequences. Our method is based on support vector machines (SVMs). We first clarified the definition of carbohydrate-binding proteins and then constructed positive and negative datasets with which the SVMs were trained. By applying the leave-one-out test to these datasets, our method delivered 0.92 of the area under the receiver operating characteristic (ROC) curve. We also examined two amino acid grouping methods that enable effective learning of sequence patterns and evaluated the performance of these methods. When we applied our method in combination with the homology-based prediction method to the annotated human genome database, H-invDB, we found that the true positive rate of prediction was improved.

Highlights

  • Sugar chains and carbohydrate-binding proteins play important roles in several biological processes such as cell-tocell signaling, protein folding, subcellular localization, ligand recognition, and developmental processes [1]

  • Carbohydrate-binding proteins are nonantibody proteins that can interact with sugar chains, and various keywords are used to annotate them in biological databases: “carbohydrate-binding protein”, “lectin”, and so on

  • One of the main results of this study is that the support vector machines (SVMs) classifier consistently showed an ability to correctly discriminate between carbohydrate-binding proteins and noncarbohydrate-binding proteins

Read more

Summary

Introduction

Sugar chains and carbohydrate-binding proteins play important roles in several biological processes such as cell-tocell signaling, protein folding, subcellular localization, ligand recognition, and developmental processes [1]. With the rapid increase in the amount of available glycoprotein data (i.e., protein sequences), there is a growing interest in the functions, physicochemical properties, and tertiary structures of carbohydrate-binding proteins and in their applications. Experimental work to identify carbohydratebinding proteins is costly and time consuming, so computational methods to predict carbohydrate-binding proteins would be useful. Carbohydrate-binding proteins are nonantibody proteins that can interact with sugar chains, and various keywords are used to annotate them in biological databases: “carbohydrate-binding protein”, “lectin”, and so on. The term “lectin” is widely used but there is no general consensus as to its definition. The Shiga toxin B subunit, for example, has been annotated “lectin-like” as well as “lectin.”

Methods
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.