Abstract

The identification of cis-regulatory modules (CRMs) can greatly advance our understanding of gene regulatory mechanisms. Despite the existence of binding sites of more than three transcription factors (TFs) in a CRM, studies in plants often consider only the cooccurrence of binding sites of one or two TFs. In addition, CRM studies in plants are limited to combinations of only a few families of TFs. It is thus not clear how widespread plant TFs work together, which TFs work together to regulate plant genes, and how the combinations of these TFs are shared by different plants. To fill these gaps, we applied a frequent pattern-mining-based approach to identify frequently used cis-regulatory sequence combinations in the promoter sequences of two plant species, Arabidopsis (Arabidopsis thaliana) and poplar (Populus trichocarpa). A cis-regulatory sequence here corresponds to a DNA motif bound by a TF. We identified 18,638 combinations composed of two to six cis-regulatory sequences that are shared by the two plant species. In addition, with known cis-regulatory sequence combinations, gene function annotation, gene expression data, and known functional gene sets, we showed that the functionality of at least 96.8% and 65.2% of these shared combinations in Arabidopsis are partially supported, under a false discovery rate of 0.1 and 0.05, respectively. Finally, we discovered that 796 of the 18,638 combinations might relate to functions that are important in bioenergy research. Our work will facilitate the study of gene transcriptional regulation in plants.

Highlights

  • Identifying cis-regulatory modules (CRMs) is important for the understanding of gene transcriptional regulation (Singh, 1998; Yuh et al, 1998; Blanchette et al, 2006; Hu et al, 2008; Cai et al, 2010)

  • Two of the best experimentally studied CRM systems so far may be the CRMs in the Eve gene in Drosophila and those in the Endo16 gene in the sea urchin, in which the locations of transcription factor binding sites (TFBSs) instances and the expression patterns controlled by these CRMs are identified (Howard and Davidson, 2004)

  • A cis-regulatory sequence is a motif in the PLACE database (Higo et al, 1999), which is represented as consensus sequence such as RTACGTGGCR

Read more

Summary

Introduction

Identifying cis-regulatory modules (CRMs) is important for the understanding of gene transcriptional regulation (Singh, 1998; Yuh et al, 1998; Blanchette et al, 2006; Hu et al, 2008; Cai et al, 2010). Many studies, both experimental and computational, have identified CRMs in animals (Yuh et al, 1998; Kel-Margoulis et al, 2000; Loots et al, 2000; Frith et al, 2001; Andrioli et al, 2002; Zhou and Wong, 2004; Gupta and Liu, 2005; Blanchette et al, 2006; Hu et al, 2008; Cai et al, 2010). Predicted CRMs are indispensible and useful due to the enormous size of genomes and the time consuming process to verify a CRM experimentally

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.