Introduction Over 35 years have passed since the 'central dogma' of molecular biology (DNA makes RNA makes protein) was proposed (Crick, 1958). Despite its remarkable verification, it is being seen increasingly as limited, for if the whole flow of information in a cell were unidirectional, all cells with the same complement of genetic material would have identical function and morphology. The truth is manifestly otherwise. A group of proteins, transcription factors, selects the information used in cells by specifically binding to 'regulatory' DNA sequences. Among other effects, this causes the differentiation of cells. These factors act as the final messenger in a transduction pathway of signals which come from outside the cell. Thus, gene expression can be regulated by the environment. Recognition between a transcription factor and its target DNA is achieved through the physical interaction of the two molecules. Since the structures of both DNA and proteins are determined by their primary sequences, there must be a set of rules to describe DNA-protein interactions entirely on the basis of sequences. The fundamental question is whether these rules are simple and comprehensible, such that the DNA recognition code can be compared with the triplet code which summarizes the rules of how DNA and protein sequences are related in the central dogma. As we review in this paper, a simple code for DNA recognition by transcription factors does seem to exist. In fact, the recognition rules allow us (i) to predict DNA-protein interactions, (ii) to change the binding specificity of an existing transcription factor, and (iii) probably even to design in a rational way a new protein which binds to a particular DNA sequence. The code has been derived from crystal structures of transcription factor-DNA complexes (Table I) and the vast body of biochemical, genetic and statistical information about the binding specificity of transcription factors. Most of the transcription factors discussed here use an a-helix, which binds to the DNA major groove, for recognition. Those proteins which have a 'recognition helix' discussed here fall mainly into four families: probe helix (PH), helix-turnhelix (HTH), zinc finger (ZnF) and C4 Zn binding proteins (C4). There is, in addition, one transcription factor family described that uses a (J-sheet, the MetJ repressor-like (MR) family. [See Table I for members of these and other families. Note that (i) individual Zn fingers are further subdivided into A and B fingers, AF and BF (Suzuki et ai, 1994a), (ii) the PH family includes homeodomain and basic-zipper proteins (Suzuki, 1993) and (iii) the C4 family includes the hormone receptors and the GATA proteins (Suzuki and Chothia, 1994).]
Read full abstract