Abstract

Sequence-specific high mobility group (HMG) box factors bind and bend DNA via interactions in the minor groove. Three-dimensional NMR analyses have provided the structural basis for this interaction. The cognate HMG domain DNA motif is generally believed to span 6–8 bases. However, alignment of promoter elements controlled by the yeast genes ste11 andRox1 has indicated strict conservation of a larger DNA motif. By site selection, we identify a highly specific 12-base pair motif for Ste11, AGAACAAAGAAA. Similarly, we show that Tcf1, MatMc, and Sox4 bind unique, highly specific DNA motifs of 12, 12, and 10 base pairs, respectively. Footprinting with a deletion mutant of Ste11 reveals a novel interaction between the 3′ base pairs of the extended DNA motif and amino acids C-terminal to the HMG domain. The sequence-specific interaction of Ste11 with these 3′ base pairs contributes significantly to binding and bending of the DNA motif. Sequence-specific high mobility group (HMG) box factors bind and bend DNA via interactions in the minor groove. Three-dimensional NMR analyses have provided the structural basis for this interaction. The cognate HMG domain DNA motif is generally believed to span 6–8 bases. However, alignment of promoter elements controlled by the yeast genes ste11 andRox1 has indicated strict conservation of a larger DNA motif. By site selection, we identify a highly specific 12-base pair motif for Ste11, AGAACAAAGAAA. Similarly, we show that Tcf1, MatMc, and Sox4 bind unique, highly specific DNA motifs of 12, 12, and 10 base pairs, respectively. Footprinting with a deletion mutant of Ste11 reveals a novel interaction between the 3′ base pairs of the extended DNA motif and amino acids C-terminal to the HMG domain. The sequence-specific interaction of Ste11 with these 3′ base pairs contributes significantly to binding and bending of the DNA motif. high mobility group polymerase chain reaction dimethyl sulfate base pair Tjian and co-workers (1Jantzen H.M. Admon A. Bell S.P. Tjian R. Nature. 1990; 344: 830-836Crossref PubMed Scopus (511) Google Scholar) originally recognized the HMG1 box in the RNA polymerase I transcription factor UBF as a novel type of DNA-binding domain. UBF carries several regions of homology to high mobility group-1 proteins. One of these so-called HMG box regions was shown to mediate binding to a DNA-affinity column. Numerous HMG box proteins have since been identified. An evolutionary study of the HMG box family has led to the notion that two types of HMG box proteins can be distinguished (2Laudet V. Stehelin D. Clevers H. Nucleic Acids Res. 1993; 21: 2493-2501Crossref PubMed Scopus (289) Google Scholar). The first type usually contains multiple HMG boxes that bind DNA in a structure-dependent but sequence-independent fashion. Prototype factors are UBF and HMG-1. The second type of protein contains a single HMG box that can bind to DNA in a sequence-specific fashion. Examples of the latter are encoded by the Tcf/Lef family members, the mammalian sex-determining geneSry and related Sox genes, and by several genes involved in fungal mating-type determination such as theSchizosaccharomyces pombe genes matmc andste11 (2Laudet V. Stehelin D. Clevers H. Nucleic Acids Res. 1993; 21: 2493-2501Crossref PubMed Scopus (289) Google Scholar, 3Grosschedl R. Giese K. Pagel J. Trends Genet. 1994; 10: 94-100Abstract Full Text PDF PubMed Scopus (733) Google Scholar). The HMG box binds DNA as a monomer (2Laudet V. Stehelin D. Clevers H. Nucleic Acids Res. 1993; 21: 2493-2501Crossref PubMed Scopus (289) Google Scholar, 3Grosschedl R. Giese K. Pagel J. Trends Genet. 1994; 10: 94-100Abstract Full Text PDF PubMed Scopus (733) Google Scholar). Various biochemical experiments involving footprinting techniques and (A/T)-to-(I/C) substitutions have demonstrated that sequence-specific HMG boxes bind predominantly in the minor groove of the DNA helix (4Giese K. Amsterdam A. Grosschedl R. Genes Dev. 1991; 5: 2567-2578Crossref PubMed Scopus (215) Google Scholar, 5van de Wetering M. Clevers H. EMBO J. 1992; 11: 3039-3044Crossref PubMed Scopus (175) Google Scholar). Circular permutation assays indicated the concomitant induction of a strong bend of approximately 70–130° in the DNA helix (6Denny P. Swift S. Connor F. Ashworth A. EMBO J. 1992; 11: 3705-3712Crossref PubMed Scopus (235) Google Scholar, 7Dooijes D. van de Wetering M. Knippels L. Clevers H. J. Biol. Chem. 1993; 268: 24813-24817Abstract Full Text PDF PubMed Google Scholar, 8Ferrari S. Harley V.R. Pontiggia A. Goodfellow P.N. Lovell-Badge R. Bianchi M.E. EMBO J. 1992; 11: 4497-4506Crossref PubMed Scopus (382) Google Scholar, 9Giese K. Cox J. Grosschedl R. Cell. 1992; 69: 185-195Abstract Full Text PDF PubMed Scopus (558) Google Scholar, 10Lnenicek-Allen M. Read C.M. Crane-Robinson C. Nucleic Acids Res. 1996; 24: 1047-1051Crossref PubMed Scopus (52) Google Scholar). Based on methylation interference footprinting performed for the T-cell-specific transcription factor Tcf1, we originally proposed a heptamer-binding site for sequence-specific HMG boxes, (A/T)(A/T)CAAAG (11van de Wetering M. Oosterwegel M. Dooijes D. Clevers H. EMBO J. 1991; 10: 123-132Crossref PubMed Scopus (451) Google Scholar). Subsequent studies on Lef-1, Sry, Sox4 and -5, and MatMc were in agreement with this notion (e.g. Refs. 4Giese K. Amsterdam A. Grosschedl R. Genes Dev. 1991; 5: 2567-2578Crossref PubMed Scopus (215) Google Scholar, 6Denny P. Swift S. Connor F. Ashworth A. EMBO J. 1992; 11: 3705-3712Crossref PubMed Scopus (235) Google Scholar, 7Dooijes D. van de Wetering M. Knippels L. Clevers H. J. Biol. Chem. 1993; 268: 24813-24817Abstract Full Text PDF PubMed Google Scholar, and 12Connor F. Cary P.D. Read C.M. Preston N.S. Driscoll P.C. Denny P. Crane-Robinson C. Ashworth A. Nucleic Acids Res. 1994; 22: 3339-3346Crossref PubMed Scopus (103) Google Scholar, 13Harley V.R. Lovell-Badge R. Goodfellow P.N. Nucleic Acids Res. 1994; 22: 1500-1501Crossref PubMed Scopus (332) Google Scholar, 14van de Wetering M. Oosterwegel M. van Norren K. Clevers H. EMBO J. 1993; 12: 3847-3854Crossref PubMed Scopus (311) Google Scholar). A number of NMR studies have provided a structural basis for the unusual mode of DNA binding by HMG boxes. The non-sequence-specific HMG boxes of HMG-1 (15Read C.M. Cary P.D. Crane-Robinson C. Driscoll P.C. Norman D.G. Nucleic Acids Res. 1993; 21: 3427-3436Crossref PubMed Scopus (244) Google Scholar, 16Weir H.M. Kraulis P.J. Hill C.S. Raine A.R.C. Laue E.D. Thomas J.O. EMBO J. 1993; 12: 1311-1319Crossref PubMed Scopus (368) Google Scholar), Saccharomyces cerevisiae NHP6A (17Allain F.H.-T. Yen Y.-M. Masse J.E. Schultze P. Dieckman T. Johnson R.C. Feigon J. EMBO J. 1999; 18: 2563-2579Crossref PubMed Scopus (157) Google Scholar), and Drosophila HMG (18Jones D.N. Searles M.A. Shaw G.L. Churchill M.E. Ner S.S. Keeler J. Travers A.A. Neuhaus D. Structure. 1994; 2: 609-627Abstract Full Text Full Text PDF PubMed Scopus (117) Google Scholar) were found to adopt highly similar structures consisting of three α-helices, arranged in an unusual L-shape or arrowhead. Based on the presence of very similar secondary structural elements, an analogous structure was suggested for the sequence-specific HMG box of Sox-5 (12Connor F. Cary P.D. Read C.M. Preston N.S. Driscoll P.C. Denny P. Crane-Robinson C. Ashworth A. Nucleic Acids Res. 1994; 22: 3339-3346Crossref PubMed Scopus (103) Google Scholar). The structure of the non-complexed Sox4 HMG box (19van Houte L.P.A. Chuprina V.P. van der Wetering M. Boelens R. Kaptein R. Clevers H. J. Biol. Chem. 1995; 270: 30516-30524Abstract Full Text Full Text PDF PubMed Scopus (61) Google Scholar) confirmed the similarity in overall structure to the three α-helix/L-shapes of HMG-1. A major difference is a shortened third helix in Sox4, caused by a helix-breaking proline residue conserved between all sequence-specific HMG boxes (2Laudet V. Stehelin D. Clevers H. Nucleic Acids Res. 1993; 21: 2493-2501Crossref PubMed Scopus (289) Google Scholar), followed by an irregularly structured C terminus. The reported NMR structures of Sry·DNA (20Werner M.H. Huth J.R. Gronenborn A.M. Clore G.M. Cell. 1995; 81: 705-714Abstract Full Text PDF PubMed Scopus (432) Google Scholar) and Lef-1-DNA (21Love J.J. Li X. Case D.A. Giese K. Grosschedl R. Wright P.E. Nature. 1995; 376: 791-795Crossref PubMed Scopus (519) Google Scholar) complexes have elucidated the nature of the sequence-specific HMG box-DNA interaction. One arm of the L-shape is formed by helix 1 and helix 2 and the second by helix 3 and the adjacent extended C-terminal segment. The concave surface of the twisted L-shape is docked into a widened minor groove of a strongly bent DNA molecule. Most base contacts occur with the helix 1/2 region in the minor groove. For Lef-1, the studied DNA motif was TTCAAAGG. Sry was studied in complex with its CAAAC core DNA motif. The bending of the DNA helix away from the HMG box is, in part, mediated through the intercalation of a hydrophobic residue (Met in Lef-1; Ile in Sry) between the second and third base pairs of the CAAA(G/C)AAAC core. Additional base contacts are mediated by a tyrosine residue located C-terminal to helix 3 (Tyr-74 in Sry; Tyr-75 in Lef-1) and occur with AT base pairs directly 5′ of the core cognate DNA motif. A major difference between the proposed structures for Sry and Lef-1 lies in their C termini. A unique feature in the Lef-1 structure is a contact of Arg-81 with the backbone phosphate directly 3′ of the core. Thus, the irregularly structured C terminus of Lef-1 makes a sequence-specific contact 5′ of the cognate core through Tyr-75 but also mediates a backbone contact 3′ of the core through Arg-81. This is possible only because the two ends of the DNA motif are brought together by bending. Many HMG box factors have been demonstrated to bind DNA in a sequence-specific fashion and to transactivate transcription in transient co-transfection assays (e.g. Refs. 14van de Wetering M. Oosterwegel M. van Norren K. Clevers H. EMBO J. 1993; 12: 3847-3854Crossref PubMed Scopus (311) Google Scholar and 22Dubin R.A. Ostrer H. Mol. Endocrinol. 1994; 8: 1182-1192PubMed Google Scholar, 23Giese K. Grosschedl R. EMBO J. 1993; 12: 4667-4676Crossref PubMed Scopus (137) Google Scholar, 24Molenaar M. van de Wetering M. Oosterwegel M. Peterson-Maduro J. Godsave S. Korinek V. Roose J. Destree O. Clevers H. Cell. 1996; 86: 391-399Abstract Full Text Full Text PDF PubMed Scopus (1614) Google Scholar, 25Yen Y.-M. Wong B. Johnson R.C. J. Biol. Chem. 1998; 273: 4424-4435Abstract Full Text Full Text PDF PubMed Scopus (74) Google Scholar). The S. pombe Ste11 transcription factor has provided a unique opportunity to study the in vivo effects of HMG box genes. ste11 is indispensable for nitrogen starvation response and mating type determination of the fission yeast. Multiple target genes for Ste11 have been found, based on the dependence of Ste11 protein and/or of intact Ste11-binding sites in the respective promoters. Kjaerulff et al. (26Kjaerulff S. Dooijes D. Clevers H. Nielsen O. EMBO J. 1997; 16: 4021-4033Crossref PubMed Scopus (45) Google Scholar) compared the Ste11-binding sites from these biological target genes and thereby defined the consensus Ste11 “response element” (or TR box) as AACAAAGAAA. This consensus TR box compared well with the original consensus DNA motif YAACAAAGAA (27Sugimoto A. Iino Y. Maeda T. Watanabe Y. Yamamoto M. Genes Dev. 1991; 5: 1990-1999Crossref PubMed Scopus (280) Google Scholar), which was based on the alignment of 10 conserved elements in the promoters of genes induced upon sexual development of S. pombe. Thus, the TR box was considerably longer than the 6–8-bp motifs reported for other HMG box proteins (2Laudet V. Stehelin D. Clevers H. Nucleic Acids Res. 1993; 21: 2493-2501Crossref PubMed Scopus (289) Google Scholar). Based on similar considerations, a 12-bp motif has been proposed for the S. cerevisiae protein Rox1, GAGAACAATYYY (28Balusubramanian B. Lowry C.V. Zitomer R.S. Mol. Cell. Biol. 1993; 13: 6071-6078Crossref PubMed Scopus (123) Google Scholar). The available biochemical and structural data for HMG boxes do not predict these extended cognate DNA motifs. In order to provide a biochemical basis for physical recognition of this biologically defined DNA motif, we have performed binding site selections with Ste11, Tcf1, MatMc, and Sox4. Furthermore, binding characteristics of the Ste11 HMG box to the selected 12-base pair motif were analyzed by methylation interference footprinting, by circular permutation, and by competition experiments in a gel retardation assay. Production of glutathioneS-transferase fusion protein Ste113 (Ste11 amino acids 1–113) (26Kjaerulff S. Dooijes D. Clevers H. Nielsen O. EMBO J. 1997; 16: 4021-4033Crossref PubMed Scopus (45) Google Scholar), MalE-MatMc-HMG fusion protein (7Dooijes D. van de Wetering M. Knippels L. Clevers H. J. Biol. Chem. 1993; 268: 24813-24817Abstract Full Text PDF PubMed Google Scholar), and the His-tagged HMG boxes of Sox4 (14van de Wetering M. Oosterwegel M. van Norren K. Clevers H. EMBO J. 1993; 12: 3847-3854Crossref PubMed Scopus (311) Google Scholar) and Tcf1 (29van Houte L. van Oers A. van de Wetering M. Dooijes D. Kaptein R. Clevers H. J. Biol. Chem. 1993; 268: 18083-18087Abstract Full Text PDF PubMed Google Scholar) have been described elsewhere. The deletion mutant Ste91, encoding amino acids 1–91 of the Ste11 clone, was generated by PCR using the primers 5′ GCACCCGGGTCTGCTTCTTTAACAGCC 3′ and 5′ GCAGAATTCTCTACGAACAGTAGACCG 3′. The PCR product was subsequently cloned into the SmaI and EcoRI restriction sites of pGEX-2T (Amersham Pharmacia Biotech). The GST-Ste91 plasmid was introduced into Escherichia colistrain DH5, and the protein was induced and purified according to the manufacturer's instructions. A random probe was generated by labeling primer A (5′ GTTACCGCGGATCCGAATTCCC 3′) with [32P]ATP using T4 polynucleotide kinase and subsequent annealing of this primer A to primer BSS (5′ CTCGGTACCTCGAGTGAAGCTTGANNNNNNNNNNNNNNNNNNGGGAATTCGGATCCGCGGTAAC 3′, where N is A/C/G/T). The independent Tcf1 site selections were performed using the modified primer BSS-TCAA (5′ CTCGGTACCTCGAGTGAAGCTTGANNNTCAANNNNNNNNNNNNNNNNNNGGGAATTCGGATCCGCGGTAAC 3′). Bold letters indicate the fixed part of primer BSS-TCAA. MatMc, Sox4, and Tcf1 HMG domain proteins were subjected to a gel retardation assay, as described previously (11van de Wetering M. Oosterwegel M. Dooijes D. Clevers H. EMBO J. 1991; 10: 123-132Crossref PubMed Scopus (451) Google Scholar). In a binding reaction, the Ste11, MatMc, Sox4, and Tcf1 proteins (50 ng) were incubated in a volume of 15 μl containing 10 mm HEPES, 60 mm KCl, 1 mm EDTA, 1 mmdithiothreitol, and 15% glycerol. After addition of 10,000 cpm (equaling 0.5 ng) of the random probe, the reaction was left at room temperature for 20 min. Samples were electrophoresed through a 5% non-denaturing polyacrylamide gel in 0.25× TBE at room temperature. The wet gel was scanned using a Molecular Dynamics PhosphorImager; retarded protein-DNA complexes were excised from the gel, and the probe was isolated by electroelution. After a phenol/chloroform step, and subsequent precipitation with NaAc and ethanol, the eluted probe was amplified in a PCR containing the labeled primer A and unlabeled primer B (5′ CTCGGTACCTCGAGTGAAGCTTGA 3′) according to the following protocol: 5 min at 94 °C, 25 times (30 s at 94 °C; 30 s at 55 °C; 30 s at 72 °C), 5 min at 72 °C. The probe was purified on a 5% polyacrylamide gel and subsequently electroeluted. New gel retardation reactions were repeated using recombinant protein and “enriched” random probe. The final enriched probe was amplified using unlabeled primers A and B, and the PCR product was subsequently digested with EcoRI and XhoI and cloned into pBluescriptSK (Stratagene). The cloned products were sequenced using the T7 Sequenase dGTP reagent kit (Amersham Pharmacia Biotech) according to the manufacturer's protocol. Experiments were carried out as described previously (11van de Wetering M. Oosterwegel M. Dooijes D. Clevers H. EMBO J. 1991; 10: 123-132Crossref PubMed Scopus (451) Google Scholar). T4 polynucleotide kinase was used to label annealed oligonucleotides with [γ-32P]ATP. Oligonucleotides were purified on a 10% non-denaturing polyacrylamide gel and electroeluted. In a binding reaction, the recombinant Ste11, MatMc, Sox4, and Tcf1 proteins (50 ng) together with 1 μg of poly[d(I-C)] were incubated and electrophoresed as described above. The oligonucleotides used are as follows: Ste1, 5′ GGGGAGAACAAAGAAAGGG 3′, and Ste2, 5′ CCCTTTCTTTGTTCTCCCC 3′; Mat1, 5′ GGGAAGAACAATGGGGGGG 3′, and Mat2, 5′ CCCCCCCATTGTTCTTCCC 3′; Tcf1, 5′ GGGAAGATCAAAGGGGGGG 3′, and Tcf2, 5′ CCCCCCCTTTGATCTTCCC 3′; Sox1, 5′ GGGCAGAACAAAGGCCGGG 3′, and Sox2, 5′ CCCGGCCTTTGTTCTGCC C 3′. Probes were labeled either at the positive or negative strand using [γ-32P]ATP and T4 polynucleotide kinase and purified as described above. The labeled probes were partially methylated at purine residues using dimethyl sulfate (30Siebenlist U. Gilbert W. Proc. Natl. Acad. Sci. U. S. A. 1980; 77: 122-126Crossref PubMed Scopus (370) Google Scholar). 100,000 cpm of methylated probe was used in a 5-fold scale up of the gel retardation binding reaction. After separation by gel retardation, the wet gel was subjected to autoradiography. The bound and free probes were excised and recovered by electroelution. After cleavage by NaOH at the G and A residues, the reaction products were analyzed on a 12.5% polyacrylamide, 8m urea sequencing gel. The probe used, 5′ CCTTCCAAGGTAGAACAAAGAAAGGAATTAAGG 3′, annealed with the complementary strand. The underline indicates the area within the primer corresponding to the Ste11-binding site referred to in Fig.4. Ste113 or Ste91 recombinant protein was bound to the optimal Ste11 oligonucleotide as described above. After a 30-min binding reaction, cold oligonucleotides were added in a 10-, 30-, 100-, or 300-fold excess. The cold oligonucleotides consisted either of the optimal Ste11 oligonucleotide described above or a non-optimal oligonucleotide consisting of SteNon-1 (5′ GGGGAGAACAAAGACCGGG 3′) annealed with SteNon-2 (5′ CCCGGTCTTTGTTCTCCCC 3′). All retarded bands were quantified on a Molecular Dynamics PhosphorImager using ImageQuant software. The optimal and non-optimal Ste11-binding sites were cloned into pBend2 (31Kim J. Zwieb C. Wu C. Adhya S. Gene (Amst.). 1989; 85: 15-23Crossref PubMed Scopus (321) Google Scholar) using the following oligonucleotides: SteBend-1 (5′ CTAGGAGAACAAAGAAA 3′) annealed with SteBend-2 (5′ TGCATTTCTTTGTTCTC 3′) and SteNonBend-1 (5′ CTAGGAGAACAAA GACC 3′) annealed with SteNonBend-2 (5′ TCGAGGTCTTTGTTCTC 3′). Probes containing the binding site at different positions were generated using the restriction enzymes BglII, NheI,XhoI, EcoRV, and BamHI, after which the fragments were labeled using T4 polynucleotide kinase and [γ-32P]ATP. The binding of the Ste11 recombinant proteins to the different probes and the subsequent gel electrophoreses were performed as described above. Experiments comparing the different proteins or the different probes were always run on the same gel. Differences in bending angles for the different probes and proteins were determined according to the algorithm described by Thompson and Landy (32Thompson J.F. Landy A. Nucleic Acids Res. 1988; 16: 9687-9705Crossref PubMed Scopus (546) Google Scholar), using the center of each retarded band as indicated in Fig. 6. The model of the Ste11-DNA complex was obtained via altering the existing Lef-1/DNA NMR model by Love et al. (21Love J.J. Li X. Case D.A. Giese K. Grosschedl R. Wright P.E. Nature. 1995; 376: 791-795Crossref PubMed Scopus (519) Google Scholar). By using the modeling program SETOR (33Evans S.V. J. Mol. Graph. 1993; 11: 134-138Crossref PubMed Scopus (1249) Google Scholar), the helix 1-to-2 loop in the Lef-1 NMR model was shortened by removing amino acids Val-21 to Ser-24 like the missing four amino acids in the Ste11 loop. To the C terminus, 11 amino acids were added to create a longer tail as present in the Ste113 recombinant protein. We modeled this tail to the minor groove at positions 11 and 12 where predictably interactions would occur. The resulting structure was transformed to the schematic model of Fig. 7 using the VMD modeling program (34Humphrey W. Dalke A. Schulten K. J. Mol. Graph. 1996; 14: 33-38Crossref PubMed Scopus (38476) Google Scholar). The resulting model is only a schematic representation of the binding site selections and footprint analyses and is not based on any NMR data of an Ste11-DNA complex. Kjaerulff et al. (26Kjaerulff S. Dooijes D. Clevers H. Nielsen O. EMBO J. 1997; 16: 4021-4033Crossref PubMed Scopus (45) Google Scholar) compared the Ste11-binding sites from genetically defined target genes and thereby defined the consensus Ste11 response element (or TR box) as AACAAAGAAA. Thus, the TR box was considerably longer than the 6–8-bp motifs reported for other HMG box proteins (2Laudet V. Stehelin D. Clevers H. Nucleic Acids Res. 1993; 21: 2493-2501Crossref PubMed Scopus (289) Google Scholar). Several explanations could account for this phenomenon, such as the occurrence of an intimate interaction with another DNA-binding protein at the consensus binding site, the cooperation of the Ste11 HMG box with another DNA-binding domain in the Ste11 protein proper, or the actual recognition of an extended binding site by the single HMG box of Ste11. In order to test the hypothesis that HMG boxes can recognize extended sequence motifs, we performed PCR-mediated DNA-binding site selection. Similar selection assays have been performed previously for Sox-5 and for Sry (6Denny P. Swift S. Connor F. Ashworth A. EMBO J. 1992; 11: 3705-3712Crossref PubMed Scopus (235) Google Scholar, 13Harley V.R. Lovell-Badge R. Goodfellow P.N. Nucleic Acids Res. 1994; 22: 1500-1501Crossref PubMed Scopus (332) Google Scholar). In both cases, comparison of the selected sequences yielded a short sequence motif, AACAAT. The HMG box of Ste11 (Ste113) was produced as a glutathioneS-transferase fusion protein in E. coli. The integrity of the recombinant HMG box protein was confirmed in a gel retardation assay using a putative Ste11-binding site from themfm3 promoter (Ref. 35Kjaerulff S. Davey J. Nielsen O. Mol. Cell. Biol. 1994; 14: 3895-3905Crossref PubMed Scopus (49) Google Scholar and data not shown). To initiate site selection, a gel retardation probe containing a core of 18 random base pairs was prepared by PCR. In order to select the highest affinity sites, conditions in gel retardation were chosen such that less than 10% of the probe was shifted in each round. After 5 rounds of gel retardation, excision of the retarded band and regeneration of the retarded probe by PCR, the selected and amplified product was subcloned and sequenced. Examination of the sequences revealed that 31 of the 40 clones had selected a consensus GAACAAAGAAA motif directly adjacent to the fixed primer BSS. In the 9 other clones, the selected DNA motif did not directly flank the fixed primer. Out of these 9 clones, 7 selected an extra A nucleotide 5′ of GAACAAAGAAA. We concluded that the fixed part of primer BSS had likely contributed to the selection of binding sites and that the fixed A nucleotide directly adjacent to the random nucleotides constituted a genuine contact base. Fig.1 gives the sequences of the individual clones and the location of the conserved DNA motif with respect to the fixed bases in primer BSS. To extend these observations, we performed the same site selection assay on three other HMG domains, representative of the three subclasses of the sequence-specific HMG domain family (2Laudet V. Stehelin D. Clevers H. Nucleic Acids Res. 1993; 21: 2493-2501Crossref PubMed Scopus (289) Google Scholar) as follows: the S. pombe mating type factor MatMc (36Kelly M. Burke J. Smith M. Klar A. Beach D. EMBO J. 1988; 7: 1537-1547Crossref PubMed Scopus (257) Google Scholar), the lymphoid-specific factor Tcf1 (11van de Wetering M. Oosterwegel M. Dooijes D. Clevers H. EMBO J. 1991; 10: 123-132Crossref PubMed Scopus (451) Google Scholar), and the Sry-like factor Sox4 (14van de Wetering M. Oosterwegel M. van Norren K. Clevers H. EMBO J. 1993; 12: 3847-3854Crossref PubMed Scopus (311) Google Scholar,37Schilham M. Oosterwegel M. Moerer P. Jing Y. de Boer P. van de Wetering M. Verbeek S. Lamers W. Kruisbeek A. Cumano A. Clevers H. Nature. 1996; 380: 711-714Crossref PubMed Scopus (386) Google Scholar). Fig. 2 gives the amino acid sequences of the four HMG domain genes used in this study and compares these with the two HMG domains for which the protein/DNA structure has been determined, Lef-1 and Sry (20Werner M.H. Huth J.R. Gronenborn A.M. Clore G.M. Cell. 1995; 81: 705-714Abstract Full Text PDF PubMed Scopus (432) Google Scholar, 21Love J.J. Li X. Case D.A. Giese K. Grosschedl R. Wright P.E. Nature. 1995; 376: 791-795Crossref PubMed Scopus (519) Google Scholar). In addition, it gives two other HMG boxes, Rox1 and Sox-5, for which a binding site has been determined (12Connor F. Cary P.D. Read C.M. Preston N.S. Driscoll P.C. Denny P. Crane-Robinson C. Ashworth A. Nucleic Acids Res. 1994; 22: 3339-3346Crossref PubMed Scopus (103) Google Scholar, 28Balusubramanian B. Lowry C.V. Zitomer R.S. Mol. Cell. Biol. 1993; 13: 6071-6078Crossref PubMed Scopus (123) Google Scholar). Highly specific DNA motifs were retrieved for all HMG domains after 5–7 selection rounds. Clear differences were noted in the length of the selected HMG domain DNA motifs, as well as in the individual consensus sequences (see TableI). Both MatMc- and Tcf1-binding sites were predominantly located directly adjacent to a stretch of fixed G residues in primer BSS. Since these G residues probably favored binding much as found for the A residue with Ste11, we performed an independent site selection for Tcf1. The selected DNA motif was forced to the center of the randomized sequence by introduction of a fixed TCAA motif in that center. These Tcf1 selections showed a clear preference for a stretch of three G nucleotides flanking the DNA motif that was selected in the initial experiment (see Table I).Table IDNA motifs selected by the HMG boxes of Ste11, MatMc, Sox-4, and Tcf-1View Large Image Figure ViewerDownload (PPT)The top three lines represent the binding motifs of Rox1 and Ste11 as deduced by comparison of multiple target promotor sites (26Kjaerulff S. Dooijes D. Clevers H. Nielsen O. EMBO J. 1997; 16: 4021-4033Crossref PubMed Scopus (45) Google Scholar, 27Sugimoto A. Iino Y. Maeda T. Watanabe Y. Yamamoto M. Genes Dev. 1991; 5: 1990-1999Crossref PubMed Scopus (280) Google Scholar, 28Balusubramanian B. Lowry C.V. Zitomer R.S. Mol. Cell. Biol. 1993; 13: 6071-6078Crossref PubMed Scopus (123) Google Scholar) and the binding motifs of Sry and Sox-5 as deduced via site selections (12Connor F. Cary P.D. Read C.M. Preston N.S. Driscoll P.C. Denny P. Crane-Robinson C. Ashworth A. Nucleic Acids Res. 1994; 22: 3339-3346Crossref PubMed Scopus (103) Google Scholar, 13Harley V.R. Lovell-Badge R. Goodfellow P.N. Nucleic Acids Res. 1994; 22: 1500-1501Crossref PubMed Scopus (332) Google Scholar). The bottom four lines represent the motifs identified by site selection for the indicated HMG boxes. For each position, the frequency of the indicated bases is given relative to the total number of analyzed sequences. For Ste11, the frequency of the first two bases is relative to the number of sequences for which the random part is extended to this position. The second series of frequencies given for Tcf1 were obtained during the second site selection performed specifically with Tcf1 on a random primer containing the fixed TCAA motif. Open table in a new tab The top three lines represent the binding motifs of Rox1 and Ste11 as deduced by comparison of multiple target promotor sites (26Kjaerulff S. Dooijes D. Clevers H. Nielsen O. EMBO J. 1997; 16: 4021-4033Crossref PubMed Scopus (45) Google Scholar, 27Sugimoto A. Iino Y. Maeda T. Watanabe Y. Yamamoto M. Genes Dev. 1991; 5: 1990-1999Crossref PubMed Scopus (280) Google Scholar, 28Balusubramanian B. Lowry C.V. Zitomer R.S. Mol. Cell. Biol. 1993; 13: 6071-6078Crossref PubMed Scopus (123) Google Scholar) and the binding motifs of Sry and Sox-5 as deduced via site selections (12Connor F. Cary P.D. Read C.M. Preston N.S. Driscoll P.C. Denny P. Crane-Robinson C. Ashworth A. Nucleic Acids Res. 1994; 22: 3339-3346Crossref PubMed Scopus (103) Google Scholar, 13Harley V.R. Lovell-Badge R. Goodfellow P.N. Nucleic Acids Res. 1994; 22: 1500-1501Crossref PubMed Scopus (332) Google Scholar). The bottom four lines represent the motifs identified by site selection for the indicated HMG boxes. For each position, the frequency of the indicated bases is given relative to the total number of analyzed sequences. For Ste11, the frequency of the first two bases is relative to the number of sequences for which the random part is extended to this position. The second series of frequencies given for Tcf1 were obtained during the second site selection performed specifically with Tcf1 on a random primer containing the fixed TCAA motif. Gel retardation was performed with the four HMG domains and their respective selected binding DNA motifs. This clearly demonstrated the specificity of each HMG domain for its optimal DNA motif (Fig.3). Both Tcf1 and MatMc showed absolute preference for their own DNA motif, whereas Ste11 and Sox4 appeared to be more promiscuous. The existing structural and biochemical data did not provide a framework for the mode of binding to this extended DNA motif. We therefore performed a DMS methylation interf

Highlights

  • Sequence-specific high mobility group (HMG) box factors bind and bend DNA via interactions in the minor groove

  • Footprinting with a deletion mutant of Ste11 reveals a novel interaction between the 3؅ base pairs of the extended DNA motif and amino acids C-terminal to the HMG domain

  • The TR box was considerably longer than the 6 – 8-bp motifs reported for other HMG box proteins [2]

Read more

Summary

EXPERIMENTAL PROCEDURES

Recombinant HMG Box Proteins—Production of glutathione S-transferase fusion protein Ste113 (Ste amino acids 1–113) [26], MalEMatMc-HMG fusion protein [7], and the His-tagged HMG boxes of Sox4 [14] and Tcf1 [29] have been described elsewhere. PCR-mediated Site Selection—A random probe was generated by labeling primer A (5Ј GTTACCGCGGATCCGAATTCCC 3Ј) with [32P]ATP using T4 polynucleotide kinase and subsequent annealing of this primer A to primer BSS (5Ј CTCGGTACCTCGAGTGAAGCTTGANNNNNNNNNNNNNNNNNNGGGAATTCGGATCCGCGGTAAC 3Ј, where N is A/C/G/T). MatMc, Sox, and Tcf HMG domain proteins were subjected to a gel retardation assay, as described previously [11]. The probe was purified on a 5% polyacrylamide gel and subsequently electroeluted. New gel retardation reactions were repeated using recombinant protein and “enriched” random probe. Oligonucleotides were purified on a 10% non-denaturing polyacrylamide gel and electroeluted. The recombinant Ste, MatMc, Sox, and Tcf proteins (50 ng) together with 1 ␮g of poly[d(I-C)] were incubated and electrophoresed as described above. 100,000 cpm of methylated probe was used in a 5-fold scale up of the gel retardation binding reaction. The cold oligonucleotides consisted either of the optimal Ste oligonucleotide described above or a non-optimal oligonucleotide

HMG Box Factors Recognize Extended Base Pair Motifs
RESULTS
DISCUSSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call