The basal elements of class II promoters are: (i) a-30 region, recognized by TATA binding protein (TBP); (ii) an initiator (Inr) surrounding the start site for transcription; (iii) frequently a downstream (+10 to +35) element. To determine the sequences that specify an Inr, we performed a saturation mutagenesis of the Inr of the SV40 major late promoter (SV40-MLP). The transcriptional activity of each mutant was determined both in vivo and in vitro. An excellent correlation between transcriptional activity and closeness of fit to the optimal Inr sequence, 5'-CAG/TT-3', was found to exist both in vivo and in vitro. Employing a neural network technique we generated from these data a weight matrix definition of an Inr that can be used to predict the activity of a given sequence as an Inr. Using saturation mutagenesis data of TBP binding sites we likewise generated a weight matrix definition of the -30 region element. We conclude the following: (i) Inrs are defined by the nucleotides immediately surrounding the transcriptional start site; (ii) most, if not all, Inrs are recognized by the same general transcription factor(s). We propose that the mechanism of transcription initiation is fundamentally conserved, with the formation of pre-initiation complexes involving the concurrent binding of general transcription factors to the -30, Inr and, possibly, downstream elements of class II promoters.
Read full abstract