Abstract

We present an enzyme protein function identification algorithm, Catalytic Site Identification (CatSId), based on identification of catalytic residues. The method is optimized for highly accurate template identification across a diverse template library and is also very efficient in regards to time and scalability of comparisons. The algorithm matches three-dimensional residue arrangements in a query protein to a library of manually annotated, catalytic residues – The Catalytic Site Atlas (CSA). Two main processes are involved. The first process is a rapid protein-to-template matching algorithm that scales quadratically with target protein size and linearly with template size. The second process incorporates a number of physical descriptors, including binding site predictions, in a logistic scoring procedure to re-score matches found in Process 1. This approach shows very good performance overall, with a Receiver-Operator-Characteristic Area Under Curve (AUC) of 0.971 for the training set evaluated. The procedure is able to process cofactors, ions, nonstandard residues, and point substitutions for residues and ions in a robust and integrated fashion. Sites with only two critical (catalytic) residues are challenging cases, resulting in AUCs of 0.9411 and 0.5413 for the training and test sets, respectively. The remaining sites show excellent performance with AUCs greater than 0.90 for both the training and test data on templates of size greater than two critical (catalytic) residues. The procedure has considerable promise for larger scale searches.

Highlights

  • Given the success of the structural genomics efforts (1125 PDB entries) and many genome sequencing efforts, automated protein function annotation is critical [1]

  • We developed an automated protein function identification method based on the hypothesis that catalytic residues and their geometric arrangement are key determinants for enzymatic chemical activity

  • We have developed an automated procedure for protein function prediction based on the identification of catalytic site residues, called the Catalytic Site Identification (CatSId)

Read more

Summary

Introduction

Given the success of the structural genomics efforts (1125 PDB entries) and many genome sequencing efforts, automated protein function annotation is critical [1]. At the core of many automated methods is the principle that sequence and structure dictate function. One approach is to infer function by focusing on global sequence or structural similarity. Methods that combine sequence and structural information include EFICAz [21,22], SOIPPA [23,24,25], DISCERN [26], PevoSOAR [27], and AnnoLite [28] and can provide improvements to sequence based methods alone. The success of global similarity-based techniques depends largely on the ability to distinguish conservation patterns that correspond to functional or catalytic portions of a protein sequence or structure. The approach we present in this work is designed to leverage the knowledge of specific catalytic site residues rather than to infer the functional features from global comparisons

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.