Protein kinases are central to cellular activities and are actively pursued as drug targets for several conditions including cancer and autoimmune diseases. Despite the availability of a large structural database for kinases, methodologies to elucidate the structure-function relationship of these proteins (without manual intervention) are lacking. Such techniques are essential in structural biology and to accelerate drug discovery efforts. Here, we implement an interpretable graph neural network (GNN) framework for classifying the functionally active and inactive states of a large set of protein kinases by only using their tertiary structure and amino acid sequence. We show that the GNN models can classify kinase structures with high accuracy (>97%). We implement the Gradient-weighted Class Activation Mapping for graphs (Graph Grad-CAM) to automatically identify structurally important residues and residue-residue contacts of the kinases without any a priori input. We show that the motifs identified through the Graph Grad-CAM methodology are functionally critical, consistent with the existing kinase literature. Notably, the highly conserved DFG and HRD motifs of the well-known hydrophobic spine are identified by the interpretable framework in addition to some of the lesser known motifs. Further, using Grad-CAM maps as the vector embedding of the protein structures, we identify the subtle differences in the crystal structures among different sub-classes of kinases in the Protein Data Bank (PDB). Frameworks such as the one implemented here, for high-throughput identification of protein structure-function relationships are essential in designing targeted small molecules therapies as well as in engineering new proteins for novel applications.
Read full abstract