Abstract
Identifying regulatory elements and revealing their role in gene expression regulation remains a central goal of plant genome research. We exploited the detailed genomic sequencing information of a large number of Arabidopsis (Arabidopsis thaliana) accessions to characterize known and to identify novel cis-regulatory elements in gene promoter regions of Arabidopsis by relying on conservation as the hallmark signal of functional relevance. Based on the genomic layout and the obtained density profiles of single-nucleotide polymorphisms (SNPs) in sequence regions upstream of transcription start sites, the average length of promoter regions in Arabidopsis could be established at 500 bp. Genes associated with high degrees of variability of their respective upstream regions are preferentially involved in environmental response and signaling processes, while low levels of promoter SNP density are common among housekeeping genes. Known cis-elements were found to exhibit a decreased SNP density than sequence regions not associated with known motifs. For 15 known cis-element motifs, strong positional preferences relative to the transcription start site were detected based on their promoter SNP density profiles. Five novel candidate cis-element motifs were identified as consensus motifs of 17 sequence hexamers exhibiting increased sequence conservation combined with evidence of positional preferences, annotation information, and functional relevance for inducing correlated gene expression. Our study demonstrates that the currently available resolution of SNP data offers novel ways for the identification of functional genomic elements and the characterization of gene promoter sequences.
Highlights
IntroductionDespite the recent discoveries of alternative mechanisms of gene expression regulation such as microRNAmediated phenomena (Filipowicz et al, 2008; Chekulaeva and Filipowicz, 2009), epigenetic effects (Razin and Kantor, 2005; Karlic et al, 2010; Bronner et al, 2011), as well as the influence of global genome structural properties (Hatfield and Benham, 2002; Mellor, 2006), the control of gene transcription via gene-upstream ciselements, most prominently those that act as specific binding sites for transcription factors, referred to as transcription factor-binding sites (TFBSs), remains a pivotal mode of gene expression regulation
One approach to bioinformatically detect functional sequence motifs relies on the assumption that they are likely conserved in orthologous gene promoter sequences across diverging species (Wasserman et al, 2000; Blanchette and Tompa, 2002, 2003)
Combining the information contained in three public databases of plant-specific cis-regulatory elements, AGRIS (Davuluri et al, 2003), Athena (O’Connor et al, 2005), and PLACE (Higo et al, 1998), we identified a set of 144 nonredundant known Arabidopsis cis-elements ranging in length from 5 to 49 nt (Supplemental Table S1)
Summary
Despite the recent discoveries of alternative mechanisms of gene expression regulation such as microRNAmediated phenomena (Filipowicz et al, 2008; Chekulaeva and Filipowicz, 2009), epigenetic effects (Razin and Kantor, 2005; Karlic et al, 2010; Bronner et al, 2011), as well as the influence of global genome structural properties (Hatfield and Benham, 2002; Mellor, 2006), the control of gene transcription via gene-upstream ciselements, most prominently those that act as specific binding sites for transcription factors, referred to as transcription factor-binding sites (TFBSs), remains a pivotal mode of gene expression regulation. One approach to bioinformatically detect functional sequence motifs relies on the assumption that they are likely conserved in orthologous gene promoter sequences across diverging species (Wasserman et al, 2000; Blanchette and Tompa, 2002, 2003). A combination of evolutionary conservation and coexpression-based approaches was performed for the plant Arabidopsis (Arabidopsis thaliana) combined with Brassica oleracea (Haberer et al, 2006) and poplar (Populus spp.; Vandepoele et al, 2009) Both studies led to the confirmation of known motifs and the identification many novel motifs. A large number of genomes is necessary to effectively detect short conserved and potentially functional regulatory motifs from sets of genomes of genetically close organisms While this is a tall requirement, the correct detection of equivalent sites across different genomes is much facilitated
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.