Identification of Hot Spots in Protein Structures Using Gaussian Network Model and Gaussian Naive Bayes.

Hua Zhang,Tao Jiang,Guogen Shan

doi:10.1155/2016/4354901

Abstract

Residue fluctuations in protein structures have been shown to be highly associated with various protein functions. Gaussian network model (GNM), a simple representative coarse-grained model, was widely adopted to reveal function-related protein dynamics. We directly utilized the high frequency modes generated by GNM and further performed Gaussian Naive Bayes (GNB) to identify hot spot residues. Two coding schemes about the feature vectors were implemented with varying distance cutoffs for GNM and sliding window sizes for GNB based on tenfold cross validations: one by using only a single high mode and the other by combining multiple modes with the highest frequency. Our proposed methods outperformed the previous work that did not directly utilize the high frequency modes generated by GNM, with regard to overall performance evaluated using F1 measure. Moreover, we found that inclusion of more high frequency modes for a GNB classifier can significantly improve the sensitivity. The present study provided additional valuable insights into the relation between the hot spots and the residue fluctuations.

Highlights

Flexibility and dynamics play key roles for proteins in implementing various biological processes and functions [1, 2]
elastic network model (ENM) and Gaussian network model (GNM) have been validated in numerous applications that resulted in reasonable agreement with a wealth of experimental data, including prediction of X-ray crystallographic B-factors for amino acids [9, 11], identifications of hot spots [12,13,14], catalytic sites [15], core amino acids stabilizing rhodopsin [16] and important residues of HLA proteins [17], elucidation of the molecular mechanisms of motor-protein motions [18], and general conformational changes and functions [3, 4, 19,20,21,22,23,24,25,26,27,28,29,30,31]
0.1263 computational outcomes of the prediction performance that are ordered by F1 measure, where the feature vector for a Gaussian Naive Bayes (GNB) classifier was extracted from single one mode, that is, ith highest mode (i = 1, 2, . . . , 20), the distance cutoff in GNM varied from 6.0 to 8.0 with the step size of 0.1, and the sliding window for one mode ranged from 1 to 21 with a step size of 2

Summary

Introduction

Flexibility and dynamics play key roles for proteins in implementing various biological processes and functions [1, 2]. Molecular dynamic (MD) simulation and normal mode analysis (NMA), are widely used to investigate the dynamic link between protein structures and functions. The ENMs, including the isotropic Gaussian network model (GNM) [8, 9] and the anisotropic network model [10], define spring-like interactions between residues that are within a certain cutoff distance. They simplify the computationally costly all-atom potentials into a quadratic function in the vicinity of the native state, which allows the decomposition of the motions into vibrational modes with different frequencies that are often known as normal modes. ENM and GNM have been validated in numerous applications that resulted in reasonable agreement with a wealth of experimental data, including prediction of X-ray crystallographic B-factors for amino acids [9, 11], identifications of hot spots [12,13,14], catalytic sites [15], core amino acids stabilizing rhodopsin [16] and important residues of HLA proteins [17], elucidation of the molecular mechanisms of motor-protein motions [18], and general conformational changes and functions [3, 4, 19,20,21,22,23,24,25,26,27,28,29,30,31]

Objectives

Methods

Results

Conclusion