Beddell et al. [1] published the first structure-based design paper in 1976 on hemoglobin ligands using Kendrew wireframe models.The first protein–ligand docking paper was in 1982 [2]. The PDB contained about 200 protein X-ray crystal structures in 1982, but very few were drug discovery targets. Many medicinal chemists believed that relevant macromolecule crystal structures were not likely to ever become available early enough during the lifetime of a drug discovery project to matter. Many also thought that designing an optimal ligand would be trivially obvious if they were lucky enough to have the high-resolution structure of their target. The initial flurry of enthusiasm in the 80s around docking and de novo design yielded to the realization in the 90s that scoring is the hard problem, not design. It’s relatively easy to design a molecule to optimize any given scoring function; the hard part is recognizing which apparently complementary molecules actually will bind to their target. This is the crux of the still-unsolved scoring problem. Predicting free energies of binding in aqueous solution has proven to be far more difficult than most of us realized. Despite 30 years of effort with many different approaches, many different investigators, and massive amounts of computer time, we still cannot reliably predict relative binding free energies with sufficient accuracy to drive organic synthesis during hit-to-lead optimization. CASP (Critical Assessment of Protein Structure, http:// predictioncenter.org) showed the protein folders in 1994 that true, blind prediction is much harder than retrospective ‘‘prediction’’ [3]. Their field has advanced steadily since then. Our own field just began to learn this recently, thanks to the CCDC (Cambridge Crystallographic Data Centre) and SAMPL (http://sampl.eyesopen.com) ‘‘contests’’, plus other critical analyses (e.g., [4]). Blind predictions for small molecule crystal structures began with the CCDC in 1999 [5]. Anthony Nicholls, Vijay Pande, and others started SAMPL0 in 2007 to predict small molecule solvation free energies. SAMPL3 was held this year and included blind prediction of host–guest complexes and trypsin-fragment binding. The initial CCDC contest results were disappointing, but helped stimulate dramatic progress. The most recent CCDC contests proved that high-resolution, accurate, blind prediction of small molecule crystal structures is possible for some small organic molecules, albeit with heroic amounts of computing [6, 7]. This is really exciting work: it proves that current theory and its implementation are sufficient to solve this problem in several non-trivial cases. SAMPL0–3 showed that true blind predictions of solvation free energies, host–guest complexes, protein-fragment structures and relative binding affinities are still remarkably difficult and typically have surprisingly large errors [8–10]. Literature in our field is still cluttered with work claiming to predict what’s already known. It is not even clear where the major errors are (partial charges, dielectric model, fixed versus polarizable force fields, torsion terms, entropy changes due to torsional degrees of freedom, water model, etc.). We still don’t understand why protein–ligand binding free energy doesn’t continue to increase with increasing size of the binding interface [11]. Brute force computing has not helped. Major improvements in our methods are required. The PDB now has over 75,000 structures, of which about 50,000 are protein–ligand complexes (http://www.pdb.org). High-resolution X-ray crystal structures are now routinely J. Blaney (&) Computational Chemistry and Cheminformatics, Small Molecule Drug Discovery, Genentech, 1 DNA Way, South San Francisco, CA 94080, USA e-mail: blaney.jeff@gene.com
Read full abstract