Abstract

Enzymes play an important role in metabolism that helps in catalyzing bio-chemical reactions. Predicting functions of enzymes by experiments is costly and time consuming. Hence a computational method is required to predict the function of enzymes. This paper presents a supervised machine learning approach to predict the functional classes and subclass of protein sequences including enzymes and non-enzymes based on 857 sequence derived features. This paper used seven sequence derived properties including amino acid composition, dipeptide composition, correlation feature, composition, transition, distribution and pseudo amino acid composition. We have used recursive feature elimination technique (RFE), in order to select optimal number of features. The support vector machine (SVM) has been used to construct a three level model with optimal number of features selected by SVM-RFE, where top (first) level distinguish a query protein as an enzyme or nonenzyme, the next (second) level predicts the enzyme functional class and the last (third) level predict the subfunctional class. The proposed model reported overall accuracy of 97.6%, precision of 97.8%and Matthew Correlation Coefficient (MCC) value of 0.93 for the first level, whereas accuracy of 87.3%, precision of 87.7% and MCC value of 0.84 for second level and accuracy of 85.6%, precision of 87.9% and MCC value of 0.86 for the third level.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.