Machine Learning Methods for Rapid Prediction of Thermostabilizing Mutants of G‐protein‐coupled receptors in Detergents

Nagarajan Vaidehi,Srisairam Achuthan,Christopher G Tate,Suvamay Jana,Reinhard Grisshammer,Sanychen Muk,Manbir Sandhu,Supriyo Bhattacharya

doi:10.1096/fasebj.2018.32.1_supplement.555.17

Abstract

G protein‐coupled receptors (GPCRs) are highly dynamic and often denature when extracted in detergents. Deriving thermostable mutants has been a successful strategy to stabilize GPCRs in detergents, but this process is experimentally tedious. We have developed a computational method that scans the entire receptor for alanine or leucine mutations and calculates the stability score for mutation of each position in the GPCR structural model. The method involves generating homology models of the receptors of varying accuracies and an ensemble of conformations by sampling the rigid body degrees of freedom of transmembrane helices. Then, an all‐atom force field function is used to calculate the enthalpy gain, known as the “stability score” upon mutation of every residue, in these receptor structures, to alanine. Inclusion of conformational sampling to account for structural perturbations improves the thermostability predictions compared to using a single structure. We have validated the method against experimentally measured thermostability data for single mutants of the β1‐adrenergic receptor (β1AR), adenosine A2A receptor (A2AR) and neurotensin receptor 1 (NTSR1). We will present some of our recent results on blind predictions of thermostabilizing mutations on NTSR1.Using the computed energy components in our dataset, we have recently tested several machine learning algorithms for rapid and reliable prediction of thermostable and non‐thermostable mutants. We used Matlab to build a binary classification model (thermostable vs. non‐thermostable) trained on data from an ensemble of three proteins (25% of dataset used for holdout cross‐validation), and tested against a fourth protein omitted from the training set. Initial dataset was heavily imbalanced to the “non‐thermostable” class. This was balanced by random replication of the undersampled “thermostable” class. Models trained by decision tree algorithms performed best in our study compared to k‐nearest neighbor (k‐NN) algorithms. Our precision and recall statistics for the best performing ensemble dataset were as follows: Boosted Trees – 62% precision, 34% recall, Simple Trees – 30% precision, 90% recall, Complex Trees – 36% precision, 73% recall, Medium k‐NN – 28% precision, 34% recall.These results are promising for predicting true thermostable mutants, and we are working to improve our methods for reducing the number of false positive predictions. This requires testing different training algorithms such as Support Vector Machines and other k‐NN and Decision Tree models, and potentially ensemble training models combining multiples of these algorithms. Our method is the first step toward a computational method for rapid prediction of thermostable mutants of GPCRs. These tools are critical for reliable and quick prediction of thermostabilizing mutations of GPCRs for drug discovery.Support or Funding InformationFunding for this work was from NIH‐RO1GM097261 to N.V.This abstract is from the Experimental Biology 2018 Meeting. There is no full text article associated with this abstract published in The FASEB Journal.

Full Text