Abstract

We developed a new computational algorithm for the accurate identification of ligand binding envelopes rather than surface binding sites. We performed a large scale classification of the identified envelopes according to their shape and physicochemical properties. The predicting algorithm, called PocketFinder, uses a transformation of the Lennard-Jones potential calculated from a three-dimensional protein structure and does not require any knowledge about a potential ligand molecule. We validated this algorithm using two systematically collected data sets of ligand binding pockets from complexed (bound) and uncomplexed (apo) structures from the Protein Data Bank, 5616 and 11,510, respectively. As many as 96.8% of experimental binding sites were predicted at better than 50% overlap level. Furthermore 95.0% of the asserted sites from the apo receptors were predicted at the same level. We demonstrate that conformational differences between the apo and bound pockets do not dramatically affect the prediction results. The algorithm can be used to predict ligand binding pockets of uncharacterized protein structures, suggest new allosteric pockets, evaluate feasibility of protein-protein interaction inhibition, and prioritize molecular targets. Finally the data base of the known and predicted binding pockets for the human proteome structures, the human pocketome, was collected and classified. The pocketome can be used for rapid evaluation of possible binding partners of a given chemical compound.

Highlights

  • We developed a new computational algorithm for the accurate identification of ligand binding envelopes rather than surface binding sites

  • An increasing number of protein structures are becoming available from high throughput structural genomic projects prior to biological and functional characterization

  • We applied other filters that did not reduce the size of the data set significantly but cleaned up the data as follows: (i) heteromolecules that are far away from the receptors were removed; (ii) heteromolecules that contact the symmetric parts of the receptor were removed because their binding sites are formed between the asymmetric units, and building a correct model requires biological information; (iii) ion clusters were removed; and (iv) duplicate combinations of Protein Data Bank entries and ligands were removed

Read more

Summary

Introduction

We developed a new computational algorithm for the accurate identification of ligand binding envelopes rather than surface binding sites. A benchmark test based on a large, systematic data set of apo structures is necessary for evaluating protein-ligand binding site identification methods.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call