Abstract

Alternative usage of transcription start sites (TSSs) is one of the key mechanisms to generate gene variation in eukaryotes. Here, we show diversified molecular evolution of TSSs in remotely related flowering plants, rice (Oryza sativa) and Arabidopsis (Arabidopsis thaliana), by comprehensive analyses of large collections of full-length cDNAs and genome sequences. We determined 45,917 representative TSSs within 23,445 loci of rice and 35,313 TSSs within 16,964 loci of Arabidopsis, about two TSSs per locus in either species. The nucleotide features around TSSs displayed distinct patterns when the most upstream TSSs were compared with downstream TSSs. We found that CG-skew and AT-skew were clearly different between upstream and downstream TSSs, and that this difference was commonly observed in rice and Arabidopsis. Relative entropy analysis revealed that the most upstream TSSs had retained canonical cis elements, whereas downstream TSSs showed atypical nucleotide features. Expression patterns were distinguishable between upstream and downstream TSSs. These results indicate that plant TSSs were generally diversified in downstream regions, resulting in the development of new gene expression patterns. Furthermore, our comparative analysis of TSS variation between the species showed a positive correlation between TSS number and gene conservation. Rice and Arabidopsis might have evolved novel TSSs in an independent manner, which led to diversification of these two species.

Highlights

  • Alternative usage of transcription start sites (TSSs) is one of the key mechanisms to generate gene variation in eukaryotes

  • We mapped rice full-length cDNAs (FLcDNAs) and their 5#-end sequences onto the rice genome to identify TSSs by a previously described method (The Rice Annotation Project, 2007)

  • For Arabidopsis, 21,859 FLcDNAs and 255,302 5#-end sequences were mapped onto the Arabidopsis genome

Read more

Summary

Introduction

Alternative usage of transcription start sites (TSSs) is one of the key mechanisms to generate gene variation in eukaryotes. Over 90% of the analyzed yeast loci had more than two transcript variants derived from different TSSs (Miura et al, 2006) These results indicated that TSS variation could be observed widely in animals and fungi. Another study of Cap Analysis of Gene Expression data sets found that TSSs of the orthologous genes did not always reside at the equivalent locations in the human and mouse genomes (Frith et al, 2006) These observations have suggested flexibility and rapid turnover of TSSs during evolution. More than 580,000 5#- or 3#-end sequences of rice FLcDNAs have become available (Satoh et al, 2007) This wealth of sequence information allows us to conduct identification and comparative analyses of TSSs in more than 10,000 loci of these plants. Yamamoto et al (2007a, 2007b) conducted a comprehensive analysis of promoter regions to detect frequently observed octamers derived from TATA box, Y patch, and CpG and reported that different octamers could be used for different gene expression mechanisms

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call