Abstract

There are now over 14,000 Structural Genomics (SG) protein structures deposited in the Protein Data Bank (PDB) and most of these are of unknown or uncertain biochemical function. Reliable computational methods for the prediction of the function of protein structures is an important current need. Typically, functions are assigned using informatics‐based approaches. The annotation of protein function by automated means has led to high rates of misannotations in some databases. Here we present a complementary and powerful approach based on computed chemical properties of the individual residues in a protein structure. Partial Order Optimum Likelihood (POOL) is used to predict the residues in the query protein structure that are important for catalysis. Typically these include the residues in the first layer that make direct contact with the substrate molecule(s) and also some residues in the second and third layers that play supporting roles in the catalytic process. Then, for proteins of known biochemical function, Graph Representation of Active Sites for Prediction of Function (GRASP‐Func) establishes local arrays of POOL‐predicted residues that are common to proteins of the same function. Then local arrays of POOL‐predicted residues of the query (SG) protein are aligned with those of the proteins of known function for the different functional types. These alignments, each SG protein against each functional family, are scored in order to predict the most likely function of the SG proteins. Results are reported for the SG members of the Ribulose Phosphate Binding Barrel (RPBB), Clp‐Crotonase, and Haloacid Dehalogenase superfamilies. While we find the SG proteins in the RPBB superfamily to be well annotated, we predict very high annotation error rates (about 70%) in the Clp‐Crotonase superfamily. Of particular interest are cases of predicted misannotation, where our prediction differs from that of the assigned function. Experimental testing of our predictions is performed by direct biochemical assays to verify our predictions.Support or Funding InformationNational Science Foundation under grant number CHE‐1305655, MathWorks, Inc. and a PhRMA Foundation Fellowship awarded to CLM.This abstract is from the Experimental Biology 2019 Meeting. There is no full text article associated with this abstract published in The FASEB Journal.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call