PDB40: The Protein Data Bank celebrates its 40th birthday

Stephen K. Burley

doi:10.1002/bip.22182

Abstract

In June 1971, a symposium, entitled Structure and Function of Proteins at the Three Dimensional Level, was held at Cold Spring Harbor Laboratory on the north shore of Long Island in New York State. This meeting established the Protein Data Bank (PDB) as the singular archive for experimentally determined structures of biological macromolecules. The first dozen structures to enter the archive included myoglobin, hemoglobin, lysozyme, carboxypeptidase A, subtilisin, chymotrypsin, papain, pancreatic trypsin inhibitor, lactate dehydrogenase, rubredoxin, and cytochrome b5–all X-ray structures: seven enzymes, one enzyme inhibitor, two electron transport proteins, and two oxygen binding proteins. Four decades later, PDB holdings total more than 85,000 structures encompassing a diverse ensemble of proteins of all shapes and sizes, nucleic acids, peptide-like antibiotics, multi-protein complexes, protein-nucleic acid complexes, macromolecular machines, viruses, and protein-drug complexes. This enormous collection of expertly annotated structures and much of the primary data underpinning their determination are available to all via the Internet at no charge. The Protein Data Bank is a commons, a resource belonging to or used by the community as a whole. Recognizing that the first twelve were just the tip of the iceberg and appreciating that these structures exemplified interesting similarities and differences, the scientific community came together in 1971 to establish the PDB as the first open-access digital data resource for the biological and chemical sciences. In those days, each newly determined protein structure was greeted with the question “What does it look like?” With the exception of myoglobin and hemoglobin, each new addition to the archive looked completely different from its predecessors. As PDB depositions accumulated, new insights regarding the physicochemical properties of linear polymeric macromolecules and the complex relationships between biological function and three-dimensional structure became evident. Structural biologists, as we now call ourselves, have been productively mining this ever-increasing treasure trove of data. With more than 85,000 structures in the archive and tens of millions of protein-encoding gene sequences available from the fruits of the human genome project, we still ask the question “What does it look like?” But, now we want to know which, if any, extant PDB entries it resembles. Many of us are interested in seeing at the atomic level how multiple instances of the same polypeptide chain fold derived from a common ancestral gene can support so many different biological or biochemical functions. Simply put we want to understand how “Function follows Form” in biology. Absent the communitarian foresight of the structuralists of 1971, our progress to more than 85,000 entries and the now routine insights into how biological macromolecules have evolved would surely have been slower and more painful, perhaps impossible. In 2009, Professor Elinor Ostrom shared the Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel for “her analysis of economic governance, especially the commons”. Professor Ostrom developed a model of human economic behavior that precisely mirrors the hard work undertaken instinctively by the structural biology community to first establish the PDB and then make it a central open-access digital data resource for biological and biomedical researchers and educators. The archive came into existence through a common realization that data sharing was imperative for the success of this particular scientific enterprise. Until the 1990s, new structures often required tens of person year's worth of effort, and that was after an ample supply of protein and high-quality crystals became available. In establishing the PDB, the structural biology community came to terms with the fact that maximal productivity would only be possible if experimental results were shared in detail. As Ostrom recounted in her Nobel Lecture, the key to coming to a shared solution to an otherwise intractable problem is face-to-face communication. For the structural biology community, the many, many discussions that led up to the founding of the PDB concluded at Cold Spring Harbor in 1971. At the close of that landmark meeting, the attendees made voluntary commitments to deposit atomic coordinates into the newly established archive. Given the myriad technological challenges and the time it took for some mindsets to adjust to new ideas re data sharing, it is not surprising that the growth of the archive was decidedly non linear. (The 100th structure was released in 1978; the 1000th in 1992, the 10,000th in 1999; and the 50,000th in 2008.) During this period of non-linear growth, the structural community came together again at three critical junctures to strengthen the macromolecular structure archive for the common good. In the 1980s, several groups of structural biologists, led in no small part by the late Fred Richards (Yale University), vigorously campaigned to make structure deposition mandatory and contributed to the development of new publication guidelines, promulgated in 1989. This groundswell of opinion effectively pressured scientific journals into requiring deposition of macromolecular atomic coordinates and assignment of a four character PDB IDs as a pre-condition for publication of a scholarly article describing a new three-dimensional structure. Similarly, in 2008, the “wisdom of the crowd” created conditions ripe for acceptance of a community-spawned requirement that PDB IDs would only be issued when both atomic coordinates and the underpinning experimental data were deposited to the archive. Third, in 2003, the leaders of the Research Collaboratory for Structural Bioinformatics PDB in the United States (Helen M. Berman), PDBe in the United Kingdom (Kim Henrick), and PDBj in Japan (Haruki Nakamura) came together to form the worldwide Protein Data Bank (wwPDB; www.wwpdb.org; BioMagResBank joined the wwPDB later in 2006). This measure formalized globalization of the PDB, which began in 1971 as a collaborative effort between data centers located in the US and the UK. Subsequently, the leadership of the wwPDB established the wwPDB Foundation, which raises funds to support the outreach and educational missions of the global archive. To mark the 40th anniversary of the founding of the PDB, the wwPDB Foundation organized the PDB40 Symposium at Cold Spring Harbor Laboratory in October 2011. Generous funding from individuals, twenty industrial sponsors, and science funders in Japan, the US, and the UK, and the meeting host helped bring nearly 300 conferees and speakers together to celebrate the PDB's entry into middle age. The meeting demographics were almost as diverse as the 85,000 plus structures available today. Much of the funding was used to ensure a strong graduate student and postdoctoral fellow presence. At the opposite end of the age/experience spectrum, representation was similarly strong. Four of the 1971 attendees were present at the meeting, as were three of the scientists who deposited atomic coordinates into the archive for the first dozen PDB structures. Depositors of the first NMR-derived structure, the first electron microscopy-derived structure, and the first integral membrane protein structure were also present. The twenty invited speakers, drawn from both academe and industry, together have been responsible for more than 4,000 PDB entries. Their identities and presentation titles are provided in conference program, which is reprinted after this editorial. Last but by no means least, the meeting served as an opportunity for PDB scientists past and present to meet and compare notes. Most of the Directors and Associate Directors of the PDB Data Centers, past and present, chaired a meeting session. An annotated group-photograph of PDB scientists in attendance follows the program. This special PDB40 issue of Biopolymers publishes six invited contributions based on meeting presentations. Two contributions come from PDB users of long standing that underscore the richness of the archive and the manifold ways in which knowledge of protein structure/function relationships can be derived from the expertly annotated data contained therein. Richardson and Richardson in “Studying and Polishing the PDB's Macromolecules” recount their involvement as depositors, illustrators, evaluators, and end-users of PDB structures, with commentary on how best to study and draw scientific inferences from them. Furnham et al. in “Abstracting Knowledge from the Protein Data Bank” describe how the science and technology of protein structural analysis have developed over the past 40 years. Two contributions come from laboratories, wherein NMR spectroscopy is used as a tool to understand the dynamical properties of viral proteins. Lorieau et al. in “The Impact of Influenza Hemagglutinin Fusion Peptide Length and Viral Subtype on its Structure and Dynamics” present an analysis of the portion of the viral coat protein responsible for nucleocapsid entry. They show that the 20-residue peptide is in a dynamic equilibrium between closed and open states, of which the former is thought to play an essential role in membrane fusion process and the latter is thought to contribute to formation of the pore that the nucleocapsid passes through during viral entry. Koharudin and Gronenborn in “Sweet Entanglements—Protein:Glycan Interactions in Two HIV-Inactivating Lectin Families” describe the results of a systematic NMR/X-ray crystallographic study of two lectin family members binding to the high-mannose glycans found on the surface of the HIV envelope glycoprotein gp120. Searls in “A Primer in Macromolecular Linguistics” lucidly introduces the ways that linguistics can be and has been applied to molecular biology, covering the relevant formal language theory at a relatively nontechnical level. Analogies between macromolecules and human natural language are used to provide intuitive insights into the relevance of grammars, parsing, and analysis of language complexity to biology. Finally, in a contribution entitled “The Future of the Protein Data Bank”, the wwPDB leadership (Berman, Kleywegt, Nakamura, and Markley) lay out what we can expect see in terms of technical development and growth of the archive. This special issue of Biopolymers marks an important milestone in the evolution of the Protein Data Bank. Its entry into middle age was celebrated by a vibrant multi-generational, multi-disciplinary scientific meeting that demonstrated convincingly that the archive continues on a most exciting trajectory. There is no “little red sports car” on the horizon! Instead, there is every reason to believe that the Protein Data Bank will remain at the center of research and teaching in cell and molecular biology. Moreover, the current wealth of depositions and their continued growth by nearly 10,000/year provides a highly productive vein to be worked by students of evolution and biodiversity. Thinking ahead more broadly the Protein Data Bank also has an important, and as yet under realized, role to play in helping to educate the public in areas such as human health and disease, agriculture, energy production, and the environment that are critical for our sustainability as a global society. Stephen K. Burley, M.D., D.Phil. Chair, Board of Directors wwPDB Foundation Michael Rossmann (Purdue University) The PDB: A historical perspective Stephen K. Burley (Eli Lilly & Company) Growth, globalization, and future of the PDB Janet Thornton (EMBL-European Bioinformatics Institute) Abstracting knowledge from protein structures for biology in the 21st century David Baker (University of Washington) Scientific discovery by protein folding game players Andrej Sali (University of California, San Francisco) Determining architectures of macromolecular assemblies by aligning interaction networks to electron microscopy density maps Jane Richardson (Duke University Medical Center) Studying and polishing the PDB's macromolecules Ad Bax (National Institute of Diabetes Digestive & Kidney Diseases/NIH) An NMR view of the interaction between & viral fusion proteins and phospholipids Axel Brunger (Stanford University/HHMI) Challenges for structure determination at low resolution Cheryl Arrowsmith (University of Toronto) Structural and chemical biology of the readers and writers of the histone code Susan Taylor (University of California, San Diego) Evolution of protein kinases: Insights from the structural kinome Soichi Wakatsuki (KEK Photon Factory Structural Biology Research Center) Coevolution of synchrotron radiation and crystallography Richard Henderson (MRC Laboratory of Molecular Biology) What is needed to make single particle electron cryomicroscopy reach its true potential? Wah Chiu (Baylor College of Medicine) CryoEM of molecular machines Angela Gronenborn (University of Pittsburgh) Synergy between NMR and CryoEM: Novel findings for HIV capsid function Johann Deisenhofer (UT Southwestern Medical Center) Remarks Kurt Wüthrich (The Scripps Research Institute/ETH Zürich) Structural biology by NMR and the Protein Data Bank Mei Hong (Iowa State University) Membrane protein solid-state NMR: Elucidating the influenza M2 structure and mechanism David Searls (Independent Consultant) Macromolecular linguistics Wayne Hendrickson (Columbia University) SLAC1 and the splendor of atomic resolution Helen Berman (Rutgers University) Closing Remarks Members of the PDB, past and present, in attendance at PDB40 (Photo by Constance Brukin). Front Row: Francis Bernstein, Martha Quesada, Gerard J. Kleywegt, Tom Koetzle, Helen M. Berman, Haruki Nakamura, John Markley, Miri Hirshberg, Joel Sussmann. Second Row: Judith L. Flippen-Anderson, Peter Rose, Gary L. Gilliland, T.N. Bhat, Jasmine Young, Buvaneswari Coimbatore Narayanan, Monica Sekharan, Irina Persikova, Sutapa Ghosh. Third Row: Spencer Bliven, Shuchismita Dutta, Guanghua Gao, Zukang Feng, Tom Oldfield, David Micallef, Luigi Di Costanzo, Catherine L. Lawson, Sanchayita Sen, Christine Zardecki, Chisa Kamada. Fourth Row: Wolfgang F.Bluhm, Chunxiao Bi, Chenghua Shao, Dimitris Dimitropoulos, Andreas Prlić, Geoffrey Barton, Sameer Velankar, Brian Hudson, Vladimir Guranović, John Westbrook, Philip E. Bourne. Back Row: Margaret Gabanyi, Eldon Uhrich, Genji Kurisu, Atsushi Nakagawa, Nomi Ron, Lihua Tan, Maria Voigt, Huanwang Yang, Rachel Kramer Green, Greg Quinn, David S. Goodsell.

Full Text