Protein kinases and phosphatases are key signaling proteins and are important drug targets. An explosion in the number of publicly available 3D structures of proteins has been seen in recent years. Three-dimensional structures of kinase and phosphatase have not been systematically investigated. This is due to the difficulty of designing structure-based descriptors that are capable of quantifying conformational changes. We have developed a triangular spatial relationship (TSR)-based algorithm that enables a unique representation of a protein’s 3D structure using a vector of integers (keys). The main objective of this study is to provide structural insight into conformational changes. We also aim to link TSR-based structural descriptors to their functions. The 3D structures of 2527 kinases and 505 phosphatases are studied. This study results in several major findings as follows: (i) The clustering method yields functionally coherent clusters of kinase and phosphatase families and their superfamilies. (ii) Specific TSR keys are identified as structural signatures for different types of kinases and phosphatases. (iii) TSR keys can identify different conformations of the well-known DFG motif of kinases. (iv) A significant number of phosphatases have their own distinct DFG motifs. The TSR keys from kinases and phosphatases agree with each other. TSR keys are successfully used to represent and quantify conformational changes of CDK2 upon the binding of cyclin or phosphorylation. TSR keys are effective when used as features for unsupervised machine learning and for key searches. If discriminative TSR keys are identified, they can be mapped back to atomic details within the amino acids involved. In conclusion, this study presents an advanced computational methodology with significant advantages in not only representing and quantifying conformational changes of protein structures but also having the capability of directly linking protein structures to their functions.
Read full abstract