In the living cells, proteins bind small molecules (or “ligands”) through a “conformational selection” mechanism, where a subset of protein structures are capable of binding the small molecules well while most other protein structures are not capable of such binding. The present work uses machine learning approaches to identify, in a very large amount of protein:ligand complexes, what protein properties are associated with their capacity to bind small molecules. In order to do so, we calculate 40 physicochemical properties on about 1.5 millions of protein conformations: ligand and protein conformations. This work describes a machine learning approach to identify the unique physico-chemical descriptors of a protein that maximize the prediction rate of potential protein molecular conformations for the test case proteins ADORA2A (Adenosine A2a Receptor), ADRB2 (Adrenoceptor Beta 2) and OPRK1 (Opioid Receptor Kappa 1). We find adequate machine learning techniques can increase by an order of magnitude the identification of “binding protein conformations” in an otherwise very large ensemble of protein conformations, compared to random selection of protein conformations. This opens the door to the systematic identification of such “binding conformations” for proteins and provides a big data approach to the conformational selection mechanism.
Read full abstract