LEA2 proteins are a diverse group of probable desiccation protectants that are developmentally induced during the postabscission stage of late embryo development by ovule abscission and are environmentally induced, probably by a different mechanism(s), in cultured embryos by ABA, desiccation, and osmotic stress (7, 8). Of the 18 Lea and LeaA mRNAs cloned from cotton (4), mRNAs of the two related genes (4) Lea4 (cDNA clone Dl 9) and LeaA2 (cDNA clone D1 32) have the highest cross-hybridization with embryo RNAs of other plants (K.S. Jakobsen, D.W. Hughes, G.A. Galau, unpublished observations). A recent compilation of Group 1 Lea sequences confirms a very high conservation at the amino acid level in the cDNAs Em of wheat, Dl 9 of cotton, p8B6 of radish, EMB-l of carrot, Emb564 of maize, and B19 of barley (2). Extensive work with a wheat Em gene has defined putative ABA-responsive elements and a leucine zipper-containing protein with an afflnity for these elements (6), and there is some evidence that these elements may be common in other kinds of Lea and other environmentally responsive genes (6, 10). The cotton Lea4 cDNA Dl 9 and the upstream portion of a cotton Lea4 gene have been sequenced (1). The amino acid sequence of cDNA D19 is peculiar in having a carboxyterminal extension of 18 amino acids not found in its homologs (2), but the upstream region of the Lea4 gene does contain elements similar to those defined in a wheat Em gene (6), the only other group 1 Lea gene sequenced (9). To confirm the peculiar carboxy extension and to help define the developmental and environmental responsive regions in the cotton Lea4 gene, both alloalleles of cotton Lea4 and 10 Lea4 cDNAs were sequenced. The principal results are that there is no carboxy extension in the protein and that transcriptionally important sequences are probably within 268 nucleotides of the transcription start. Although the expression of Lea4 and LeaA2 mRNAs cannot be distinguished during the postabscission stage or during environmental induction in embryos (7, 8), earlier in embryo development the LeaA2 mRNAs, but not Lea4 mRNAs, are induced to relatively low levels at the time of transiently high levels of ABA (7). To define the relationship of the proteins encoded by Lea4 and LeaA2, and to help understand the basis of their differential expression during development, both alloalleles of LeaA2 and two LeaA2 cDNAs were also sequenced. LeaA2-encoded proteins are very similar to the one encoded by Lea4, except that they contain a tandem duplication of a 20-amino acid sequence present only once in the Lea4-encoded protein. The two alloalleles of Lea4 were sequenced along with portions of 10 Lea4 cDNA clones (Table I). Two alleles of both alloalleles were identified in this collection (Table II). Figure 1 presents the sequence of the D genome alloallele, Lea4-D9 clone GD 19-9S, as the reference Lea4 sequence. Lea4-D9 is very similar to, but not identical with, the sequence of cotton Lea4 clone gD19 (1). The transcribed region of the other Lea4 alloallele, Lea4A13 clone GD19-13RS, is very similar to that of Lea4-D9 (Fig. 1). However, there is no similarity before Lea4-D9 nucleotide 1822 and they diverge after Lea4-D9 nucleotide 3279. For clarity, the sequence of Lea4-A 13 in these two regions is presented separately in Figure 2. At least part of the difference in the 5' end is due to a repetitive element in Lea4A13; the sequence between nucleotides 1033 and 1932 is present in reverse orientation upstream of the unrelated cotton MatS-A gene which encodes a 2S albumin storage protein (5). After about Lea4-D9 nucleotide 2630 in the 3'-nontranslated region, the genes contain a complex pattern of tandem imperfect repeats of about 80 to 120 nucleotides in length. Multiple insertion/deletion events and substitutions prevent alignment after Lea4-D9 nucleotide 3279, although there are blocks of similar sequence until Lea4-D9 nucleotide 3422 and Lea4-A 13 nucleotide 3732 (comparisons not shown). Lea4 cDNA Dl9 was completely sequenced along with major portions of nine other Lea4 cDNAs (Table I). Eight of the cDNAs, typified by cDNA D47, are transcribed from Lea4-D9. The ninth cDNA, Dl 9, is transcribed from a Lea4D allele that differs from Lea4-D9 in the mRNA region by a single substitution (Table II). At least three polyadenylation sites are used in Lea4-D9 and a fourth site is used in Lea4D19 (Fig. 1). The 10th cDNA, D108, is transcribed from a Lea4-A allele that in the mRNA region differs from Lea4A 13 by a single insertion/deletion event (Table II). Both Lea4 alloalleles encode identical proteins, although there are four nucleotide substitutions in their amino acidcoding regions (Fig. 1). The protein does not contain the carboxy-terminal 18 amino acids that were reported earlier ' Supported by a grant from the National Institutes of Health. 2 Abbreviations: LEA, late embryogenesis abundant; kb, kilobase.
Read full abstract