PINGU: PredIction of eNzyme catalytic residues usinG seqUence information.

Priyadarshini P Pai,S S Shree Ranjani,Sukanta Mondal,Narayanaswamy Srinivasan

doi:10.1371/journal.pone.0135122

Abstract

Identification of catalytic residues can help unveil interesting attributes of enzyme function for various therapeutic and industrial applications. Based on their biochemical roles, the number of catalytic residues and sequence lengths of enzymes vary. This article describes a prediction approach (PINGU) for such a scenario. It uses models trained using physicochemical properties and evolutionary information of 650 non-redundant enzymes (2136 catalytic residues) in a support vector machines architecture. Independent testing on 200 non-redundant enzymes (683 catalytic residues) in predefined prediction settings, i.e., with non-catalytic per catalytic residue ranging from 1 to 30, suggested that the prediction approach was highly sensitive and specific, i.e., 80% or above, over the incremental challenges. To learn more about the discriminatory power of PINGU in real scenarios, where the prediction challenge is variable and susceptible to high false positives, the best model from independent testing was used on 60 diverse enzymes. Results suggested that PINGU was able to identify most catalytic residues and non-catalytic residues properly with 80% or above accuracy, sensitivity and specificity. The effect of false positives on precision was addressed in this study by application of predicted ligand-binding residue information as a post-processing filter. An overall improvement of 20% in F-measure and 0.138 in Correlation Coefficient with 16% enhanced precision could be achieved. On account of its encouraging performance, PINGU is hoped to have eventual applications in boosting enzyme engineering and novel drug discovery.

Highlights

Enzymes play a key role in catalyzing biochemical reactions important for life
For the construction of suitable training and independent test datasets, enzyme data for the predictor development was collected from the datasets created by Dou et al [20] and the Catalytic Site Atlas (CSA) 2.0 dataset[8]
A pool of sequence information based on the ATOM record in these Protein Data Bank (PDB) structures was generated for the study

Summary

Introduction

Enzymes play a key role in catalyzing biochemical reactions important for life. Their function is governed by a small number of amino acids known as catalytic residues. By means of their structure and chemical properties, these residues directly take part in the catalysis process, determining to a certain extent, the chemical properties of the enzyme. Gaining knowledge of the catalytic residues can help unravel enzyme functions, but in the long run, boost enzyme engineering and drug design applications [1, 2].

Methods

Results

Conclusion