Abstract

BackgroundThe automated prediction of the enzymatic functions of uncharacterized proteins is a crucial topic in bioinformatics. Although several methods and tools have been proposed to classify enzymes, most of these studies are limited to specific functional classes and levels of the Enzyme Commission (EC) number hierarchy. Besides, most of the previous methods incorporated only a single input feature type, which limits the applicability to the wide functional space. Here, we proposed a novel enzymatic function prediction tool, ECPred, based on ensemble of machine learning classifiers.ResultsIn ECPred, each EC number constituted an individual class and therefore, had an independent learning model. Enzyme vs. non-enzyme classification is incorporated into ECPred along with a hierarchical prediction approach exploiting the tree structure of the EC nomenclature. ECPred provides predictions for 858 EC numbers in total including 6 main classes, 55 subclass classes, 163 sub-subclass classes and 634 substrate classes. The proposed method is tested and compared with the state-of-the-art enzyme function prediction tools by using independent temporal hold-out and no-Pfam datasets constructed during this study.ConclusionsECPred is presented both as a stand-alone and a web based tool to provide probabilistic enzymatic function predictions (at all five levels of EC) for uncharacterized protein sequences. Also, the datasets of this study will be a valuable resource for future benchmarking studies. ECPred is available for download, together with all of the datasets used in this study, at: https://github.com/cansyl/ECPred. ECPred webserver can be accessed through http://cansyl.metu.edu.tr/ECPred.html.

Highlights

  • The automated prediction of the enzymatic functions of uncharacterized proteins is a crucial topic in bioinformatics

  • ECPred validation performance analysis The overall predictive performance of each Enzyme Commission (EC) number class was measured on the class-specific validation datasets, the generation of which were explained in the Methods section

  • In order to observe better estimates of the performance of ECPred, we carried out additional analyses using independent test sets, which are explained in the following sub-sections

Read more

Summary

Introduction

The automated prediction of the enzymatic functions of uncharacterized proteins is a crucial topic in bioinformatics. Several methods and tools have been proposed to classify enzymes, most of these studies are limited to specific functional classes and levels of the Enzyme Commission (EC) number hierarchy. We proposed a novel enzymatic function prediction tool, ECPred, based on ensemble of machine learning classifiers. Enzyme Commission (EC) numbers constitute an ontological system with the purpose of defining, organizing and storing enzyme functions in a curator friendly and machine readable format. Four levels of EC numbers are related to each other in a functional hierarchy. The system annotates the main enzymatic classes (i.e., 1: oxidoreductases, 2: transferases, 3: hydrolases, 4: lyases, 5: isomerases and 6: ligases). The EC system is the universally accepted way of annotating the enzymes in biological databases

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call