Simple knowledge-based descriptors to predict protein-ligand interactions. methodology and validation.

J Willem M Nissink,Marcel L Verdonk,Gerhard Klebe

doi:10.1023/a:1008109717641

Abstract

A new type of shape descriptor is proposed to describe the spatial orientation for non-covalent interactions. It is built from simple, anisotropic Gaussian contributions that are parameterised by 10 adjustable values. The descriptors have been used to fit propensity distributions derived from scatter data stored in the IsoStar database. This database holds composite pictures of possible interaction geometries between a common central group and various interacting moieties, as extracted from small-molecule crystal structures. These distributions can be related to probabilities for the occurrence of certain interaction geometries among different functional groups. A fitting procedure is described that generates the descriptors in a fully automated way. For this purpose, we apply a similarity index that is tailored to the problem, the Split Hodgkin Index. It accounts for the similarity in regions of either high or low propensity in a separate way. Although dependent on the division into these two subregions, the index is robust and performs better than the regular Hodgkin index. The reliability and coverage of the fitted descriptors was assessed using SuperStar. SuperStar usually operates on the raw IsoStar data to calculate propensity distributions, e.g., for a binding site in a protein. For our purpose we modified the code to have it operate on our descriptors instead. This resulted in a substantial reduction in calculation time (factor of five to eight) compared to the original implementation. A validation procedure was performed on a set of 130 protein-ligand complexes, using four representative interacting probes to map the properties of the various binding sites: ammonium nitrogen, alcohol oxygen, carbonyl oxygen, and methyl carbon. The predicted 'hot spots' for the binding of these probes were compared to the actual arrangement of ligand atoms in experimentally determined protein-ligand complexes. Results indicate that the version of SuperStar that applies to our descriptors is capable to predict the above-mentioned atom types in ligands correctly with success rates of 59% and 74%, respectively, for all ligand atoms (regardless of their solvent accessibility), and a subset of solvent-inaccessible ones. If not only exact atom-type matches are counted, but also those that identify ligand atoms of similar physicochemical properties, the prediction rates rise to 75% and 89%. These rates are close to those obtained by the original SuperStar method (being 67% and 82%, respectively, for the prediction of exact matching atom types, and 81% and 91% in the case of predicting similar atom types).

Full Text