Abstract

Perspective The Roots of Bioinformatics in Protein Evolution Russell F. Doolittle 1,2 * 1 Department of Chemistry & Biochemistry, University of California, San Diego, La Jolla, California, United States of America, 2 Department of Molecular Biology, University of California, San Diego, La Jolla, California, United States of America Introduction Bioinformatics as a formal discipline came of age in the late 1980s, greatly stimulated by the 1989 Human Genome Initiative. The roots of the field go back several decades earlier, however, to an era when computers were not needed to manage the data. In this personal reflec- tion, I review the confluence of events beginning in the 1950s that brought a number of fields together in a common pursuit. Particularly, I offer some com- ments about early amino acid sequence comparisons, the results of which revealed so much about evolution, and how the computer became necessary only when the number of known sequences began to grow exponentially. Many other authors have already re- corded their thoughts on the evolutionary roots of bioinformatics in accounts that are doubtless more thorough and balanced than can be recorded in this brief personal reflection ([1,2], inter alia). All are in agreement about certain pivotal events that were true milestones: the double-helix model of DNA, the first determination of the amino acid sequence of a protein, and the conceptual linking of DNA sequences and protein sequences. My plan is to expand on some related matters with the hope of providing some additional back- ground on those early scenes. Sequences Sequences, the simple order of individual units in biological polymers, are at the heart of bioinformatics, and the search for relationships among them and the recon- struction of their histories has arguably proved the most informative of biological inquiries. Today dozens of giant data banks store what seem to be countless numbers of nucleic acid and protein sequences. But there was a time, only 50 or 60 years ago, when hardly any sequences were known at all. Nonetheless, there were those who already appreciated that the web of all life would eventually be reconstructed on the basis of sequence data alone. There was an obligatory progression of events, beginning with chemistry, then biology, and, finally, the need for computers. Among the technological advances that made sequence determinations possible, two are extremely notable: the introduc- tion in the 1940s of paper chromatogra- phy as a simple tool for identifying amino acids and their derivatives [3], for one, and the use of suitable chemical reagents that reacted (more or less) exclusively with amino groups, for another—particularly an amino-tagging reagent by Sanger [4] and an amino acid-labilizing reagent by Edman [5]. Some important details of their seminal and unique contributions need to be described here, however briefly. Chemistry It must be difficult for a young scientist today to imagine how primitive circum- stances were in the mid-20th century. The effort needed to determine even a short amino acid sequence was more than considerable; it was daunting (some of that tedium may carry through in the following description). Typically, the first step in determining the sequence of a peptide or protein was to establish its amino acid composition. It was well known that heating a protein or peptide with strong aqueous acid broke the bonds between the constituent amino acids (unhappily, glutamines and aspara- gines were changed into glutamic and aspartic acids in the process, and a few other amino acids like tryptophan dam- aged). The resulting hydrolysate could be spotted on a large piece of filter paper and separation of the various amino acids obtained by letting an organic solvent creep over the paper, partitioning the amino acids according to their relative solubilities in one phase or the other. The locations of the amino acids could be found by staining the dried paper with ninhydrin, a compound that gave a blue color with amino groups. After a preliminary amino acid compo- sition was in hand, the next step was to break the protein or peptide into smaller pieces (the ‘‘divide and conquer’’ strategy). The simplest method was to use partial acid hydrolysis, taking advantage of the fact that bonds next to some amino acids break more easily than others. The other popular option was to use proteolytic enzymes like trypsin or chymotrypsin. In either case, the peptide fragments were purified, often by paper chromatography, and their individual amino acid composi- tions determined. Indeed, one reason that protein se- quences were attacked first, rather than RNA or DNA, was because there were 20 different amino acids, and a random, partial hydrolysis of a polypeptide chain could give rise to smaller peptides with unique compositions. The logistics of the same approach for a polymer made of only four different things was impossible to contemplate. More Chemistry The Sanger reagent, fluorodinitroben- zene (FDNB), had several important features. First, the bond between it and the tagged amino acid was resistant to acid hydrolysis; second, the derivatized amino acid was sufficiently non-polar that it could be extracted from the acid hydroly- sate with an organic solvent like ether; and finally, the derivatives were bright yellow and could be readily identified by paper chromatography. The operation could be conducted on the starting peptide or protein, as well as on the fragments generated by various means. It was a slow Citation: Doolittle RF (2010) The Roots of Bioinformatics in Protein Evolution. PLoS Comput Biol 6(7): e1000875. doi:10.1371/journal.pcbi.1000875 Editor: David B. Searls, Philadelphia, United States of America Published July 29, 2010 Copyright: s 2010 Russell F. Doolittle. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: The author received no specific funding for this work. Competing Interests: The author has declared that no competing interests exist. * E-mail: rdoolittle@ucsd.edu PLoS Computational Biology | www.ploscompbiol.org July 2010 | Volume 6 | Issue 7 | e1000875

Highlights

  • Bioinformatics as a formal discipline came of age in the late 1980s, greatly stimulated by the 1989 Human Genome Initiative

  • Many other authors have already recorded their thoughts on the evolutionary roots of bioinformatics in accounts that are doubtless more thorough and balanced than can be recorded in this brief personal reflection ([1,2], inter alia)

  • All are in agreement about certain pivotal events that were true milestones: the double-helix model of DNA, the first determination of the amino acid sequence of a protein, and the conceptual linking of DNA sequences and protein sequences

Read more

Summary

Introduction

Bioinformatics as a formal discipline came of age in the late 1980s, greatly stimulated by the 1989 Human Genome Initiative. The first step in determining the sequence of a peptide or protein was to establish its amino acid composition.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call