Machine Learning to Identify Flexibility Signatures of Class A GPCR Inhibition.

Joseph Bemister-Buffington,Sebastian Raschka,Leslie A Kuhn,Alex J Wolf

doi:10.3390/biom10030454

Joseph Bemister-Buffington, Sebastian Raschka + Show 2 more

Open Access

https://doi.org/10.3390/biom10030454

Copy DOI

Abstract

We show that machine learning can pinpoint features distinguishing inactive from active states in proteins, in particular identifying key ligand binding site flexibility transitions in GPCRs that are triggered by biologically active ligands. Our analysis was performed on the helical segments and loops in 18 inactive and 9 active class A G protein-coupled receptors (GPCRs). These three-dimensional (3D) structures were determined in complex with ligands. However, considering the flexible versus rigid state identified by graph-theoretic ProFlex rigidity analysis for each helix and loop segment with the ligand removed, followed by feature selection and k-nearest neighbor classification, was sufficient to identify four segments surrounding the ligand binding site whose flexibility/rigidity accurately predicts whether a GPCR is in an active or inactive state. GPCRs bound to inhibitors were similar in their pattern of flexible versus rigid regions, whereas agonist-bound GPCRs were more flexible and diverse. This new ligand-proximal flexibility signature of GPCR activity was identified without knowledge of the ligand binding mode or previously defined switch regions, while being adjacent to the known transmission switch. Following this proof of concept, the ProFlex flexibility analysis coupled with pattern recognition and activity classification may be useful for predicting whether newly designed ligands behave as activators or inhibitors in protein families in general, based on the pattern of flexibility they induce in the protein.

Highlights

Recognizing the features of small, drug-like ligand molecules and protein structures that synergize to create an active protein state versus an inactive protein state is essential to design drugs with predictable effects on the protein and organism
We focus on the other side of the interface, seeking a general method that can learn from a series of active and inactive structures in a protein family to identify the shared subset of protein features that are reliable indicators of whether the protein is in an active or inactive state
To identify characteristic flexibility features and avoid overfitting when predicting protein activity, we focused on identifying the subset of features most likely to contain useful information (Figure 3)

Summary

Introduction

Recognizing the features of small, drug-like ligand molecules and protein structures that synergize to create an active protein state (binding to an agonist ligand) versus an inactive protein state (binding an inhibitory ligand) is essential to design drugs with predictable effects on the protein and organism. Much drug discovery research has focused on mimicking small molecule ligands of known activity (when available), either by incorporating very similar chemical groups that lead to cost-effective synthesis and favorable bioavailability and toxicity profiles, or by mimicking the three-dimensional volumes and chemical surface features of such molecules [1,2,3]. It is not uncommon for such molecules to bind the protein with moderate to high affinity, but not always with the activating or inhibitory effect that is sought. We explore whether a small number of these intrinsic flexibility features can reliably predict whether a given protein is in an active or inactive state

Objectives

Methods

Results

Conclusion