Abstract

Last March, I attended a symposium at Rockefeller University on the analysis of genetic sequences. It dealt with how computers are used to store and analyze DNA and protein sequences. Shortly thereafter I learned that ABT was planning a special issue on computers. I searched my memory and my journals for other examples of biology's use of computers. No matter where I looked I found that the computer was not only facilitating the storage and manipulation of data, but also making things possible that were only dreams a few years ago. Computers have greatly increased the biologist's capacity to investigate and understand nature, and the most exciting thing about this development is that it's only beginning! In this column, I will review some of the discoveries I made about computer applications, hoping that at least a few of them will be new to you also. Rockefeller symposium was opened by the university's president, Joshua Lederberg, a strong advocate of computer literacy for molecular biologists. He described some analyses of protein sequences he'd done in the 1960s. This early attempt seems crude by today's standards. Computer technology and software were much less sophisticated, and the number of protein sequences available for analysis was infinitesimal, while DNA sequencing was in its infancy. Computer capability and molecular biology's need for it have grown together over the past 20 years. It is estimated that known DNA sequences total 3 million -bases, and that this total will double shortly (Science, March 30, 1984). British biologists are already finding the available computer facilities too limited (Nature, May 17, 1984), and in the United States, the National Institutes of Health has awarded a contract to IntelliGenetics to establish BIONET, which will give researchers access to national databases on DNA and protein sequences and will provide a software library for use in sequence searching, matching, and manipulation. DNA sequencing is proceeding so rapidly, with so much data accumulating, that only computers can give scientists the capacity to organize the data and make sense of it. In a recent review, researchers at Caltech's Microchemical Facility describe a highly automated system they've developed for the analysis and synthesis of genes and proteins (Nature, July 12, 1984). As they say, The importance of computer-aided data analysis to the operation of the Microchemical Facility cannot be overemphasized. For example, DNAMST is a collection of programs for the input, analysis, and display of nucleic acid and protein sequences. It can determine the amino acid sequences encoded by particular DNA sequences. Such protein-coding sequences are called open reading frames. DNAMST can identify these regions, even when their protein products have yet to be found. In fact, knowing the amino acid sequences can give researchers clues to finding the proteins; the sequences can be used to predict the proteins' probable molecular weights and hydrophobic or hydrophilic properties. Richard J. Roberts, speaking at the Rockefeller symposium, also stressed the importance of computers in describing his work on sequencing the adenovirus 2 genome. This genome, which is about 36,000 base pairs long, is tiny compared to that of a eukaryote, but even here, the information would become unwieldy without computers (Cold Spring Harbor Symposia on Quantitative Biology, 1983). Since several groups are sequencing this virus, computer searches are done to find discrepancies in the results. nucleotide sequence can also be searched for features that are characteristic of particular functions, for example, control sequences for initiation or termination of transcription. In other words, the sequence is not an end in itself, but the starting point for analysis of the virus' functions and mechanisms (F. Sanger in Cold Spring Harbor Symposia on Quantitative Biology, 1983). Russell F. Doolittle has been described as an amateur in the analysis of amino acid sequences, and his Newat database will probably soon be supplanted by BIONET. But, like many amateurs, he brings so much enthusiasm to his work that he has accomplished a great deal. At the symposium he described some of his discoveries. When the sequence of the simian sarcoma virus oncogene, v-sis, was published, he searched the Newat database for similar sequences and found that the oncogene was homologous to a platelet-derived growth factor (Science, July 15, 1983). This and other evidence have made the relationship between some oncogenes and growth factors an important clue to understanding how oncogenes can cause cells to become cancerous (Scientific American, August 1984). Because he will search his databank for those looking for sequences similar to the one they are working on, Doolittle was involved in comparing the amino acid sequences of the repressor and cro proteins of the bacteriophages X, 434, and p22 (Nature, July 29, 1982). All these DNA-binding proteins show homologies in the portions of the sequences that bind DNA. researchers concluded that these sequences are common to all DNA-binding proteins, and subsequent work on other proteins has borne out this conclusion. Doolittle also addressed the more

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call