There are currently over 14,300 Structural Genomics (SG) protein structures deposited in the PDB by protein structure initiatives. However, most of these SG proteins have unknown or putative function annotations. This accumulated structural information represents a tremendous contribution to structural biology and genomics. Still, the addition of accurate functional annotations for these SG proteins would add substantial value to this information. Our approach to functional annotation and validation incorporates predicting functional assignments through structure‐based computed chemical properties and local structure matching followed by biochemical validation. This research focuses on four superfamilies: Crotonase, Ribulose Phosphate Binding Barrel, 6‐Hairpin Glycosidase, and Concanavlin A‐like Lectins and Glucanases. First, Partial Order Optimum Likelihood (POOL) is used to predict computationally the catalytically important residues in each protein structure. Next, Structurally Aligned Local Sites of Activity (SALSA) develops spatially‐localized consensus signatures for the proteins of known function in each functional family within each superfamily based on POOL‐predicted residues and functionally characterized residues of importance. Then, the POOL‐predicted residues for each SG protein are compared to each consensus signature and scored to determine their degree of similarity at the local active site. Finally, we introduce a new, computationally faster method for sorting protein superfamilies and annotating protein function using local structure matching in graph representation: Graph Representation of Active Sites for Prediction of Function (GRASP‐Func). Sets of tetrahedra are generated through Delaunay triangulation for each protein structure using the alpha carbon atoms of each residue. Then, sets of proteins with matched tetrahedra are grouped together and images are generated showing the relationship of each protein (node) and its neighbors (edges) with similar active sites. We compare SALSA and GRASP‐Func and show that both methods correctly sort the superfamilies into their respective functional families. Both methods also make similar functional predictions for the SG proteins, with GRASP‐Func performing in far less time. Thus GRASP‐Func enables large‐scale comparisons and functional assignments within and across superfamilies. Finally, we are able to test these predictions biochemically to confirm function. Biochemical data for the Crotonase Superfamily show that while proteins have some promiscuous functionality, our methods predict the correct dominant function for each protein tested. The goal of this project is to provide a validated approach to functional annotation to enable applications from drug target identification to green chemistry and biofuel production.Support or Funding InformationSupport from NSF‐CHE‐1305655, NSF‐MCB‐1158176, NSF‐MCB‐1517290, PhRMA Foundations (Predoctoral Fellowship in Informatics awarded to CLM), NSF‐GRFP (JSL), MathWorks, Inc., and American Cancer Society Research Scholar Grant RSG‐12‐161‐01‐DMC (PJB).This abstract is from the Experimental Biology 2018 Meeting. There is no full text article associated with this abstract published in The FASEB Journal.
Read full abstract