Abstract

Article Figures and data Abstract eLife digest Introduction Results Discussion Materials and methods Data availability References Decision letter Author response Article and author information Metrics Abstract During development eukaryotic gene expression is coordinated by dynamic changes in chromatin structure. Measurements of accessible chromatin are used extensively to identify genomic regulatory elements. Whilst chromatin landscapes of pluripotent stem cells are well characterised, chromatin accessibility changes in the development of somatic lineages are not well defined. Here we show that cell-specific chromatin accessibility data can be produced via ectopic expression of E. coli Dam methylase in vivo, without the requirement for cell-sorting (CATaDa). We have profiled chromatin accessibility in individual cell-types of Drosophila neural and midgut lineages. Functional cell-type-specific enhancers were identified, as well as novel motifs enriched at different stages of development. Finally, we show global changes in the accessibility of chromatin between stem-cells and their differentiated progeny. Our results demonstrate the dynamic nature of chromatin accessibility in somatic tissues during stem cell differentiation and provide a novel approach to understanding gene regulatory mechanisms underlying development. https://doi.org/10.7554/eLife.32341.001 eLife digest For an embryo to successfully develop into an adult animal, specific genes must act in different types of cells. Though all the cells have the same genes encoded within their DNA, looking at the way that the DNA is packaged can indicate which parts of the DNA are important for that particular cell type. If regions of DNA are “open” one can infer that those regions are actively involved in gene regulation, whereas “closed” regions are considered less important. It is currently difficult to determine which parts of the DNA are open within an individual cell type in a complex organ, such as the brain. Existing methods require the cells to be physically isolated from the tissue, which is technically challenging. To overcome this issue, Aughey et al. have now developed a method that does not require isolation of the cells. The new technique involves using genetic engineering to introduce an enzyme called Dam into specific cell types in living fruit flies. This enzyme adds a chemical label on regions of open DNA, which can then be detected. Aughey et al. tested this technique on various cells of the developing brain and gut, and were able to see differences in the openness of DNA that corresponded to the action of genes that are important in each cell type. The data also contain trends that help to understand the role of open DNA in development. For example, mature cells were shown to overall have less open DNA than the stem cells that divide to generate them. Aughey et al. hope their new technique will be of use to other researchers working with either fruit flies or mammalian tissues. The knowledge that scientists will gain from identifying how open DNA contributes to gene regulation, in both healthy and diseased tissues, will further our understanding of human development and the biology of diseases such as cancer. https://doi.org/10.7554/eLife.32341.002 Introduction During the development of a multicellular organism, gene expression is tightly regulated in response to spatially and temporally restricted signals. Changes to gene expression are accompanied by concomitant changes to chromatin structure and composition. Therefore chromatin states vary widely across developmental stages and cell types. Functional regions of a genome, including promoters and enhancers, can be identified by their relative lack of nucleosomes. These regions of ‘open chromatin’ can be assayed by their accessibility to extrinsic factors. Consequently, chromatin accessibility profiling techniques are commonly used to investigate chromatin states (reviewed in [Tsompana and Buck, 2014]). Chromatin is highly accessible in pluripotent cell types such as embryonic stem (ES) cells, but is compacted following differentiation (Meshorer and Misteli, 2006). It has been suggested that this open chromatin represents a permissive state to which multiple programmes of gene regulation may be rapidly applied upon differentiation (Gaspar-Maia et al., 2011). The nature of chromatin accessibility across different developmental stages in vivo is less well understood. Imaging studies have been used to demonstrate gross changes to chromatin structure, for example changes to the distribution of heterochromatin have been observed in post-mitotic cells (Francastel et al., 2000; Le Gros et al., 2016). Molecular studies investigating chromatin states in vivo during development have tended to utilise heterogeneous tissues due to the fact that profiling the epigenome of individual cell types frequently requires physical isolation of cells or nuclei, which can be laborious and prone to human error (McClure and Southall, 2015). Therefore, there is a lack of information regarding cell-type-specific changes to chromatin states in in vivo models. Whilst recently developed methods such as ATAC-seq have become popular and address many of the limitations inherent to earlier techniques such as DNAse-seq (i.e. requires fewer cells and increased assay speed), these techniques still require the physical separation of cells and isolation of genomic DNA before chromatin accessibility is assayed (Buenrostro et al., 2013). It has been suggested that ectopic expression of untethered DNA adenine methyltransferase (Dam) results in specific methylation of open chromatin regions whilst nucleosome bound DNA is protected (Wines et al., 1996; Bulanenkova et al., 2007; Boivin and Dura, 1998; Singh and Klar, 1992). However, the efficacy of using Dam methylation for chromatin accessibility profiling on a genomic scale is not clear. Furthermore, expression of Dam in a cell-type-specific manner, at levels low enough to avoid toxicity and oversaturated signal, has not been possible until now. Transgenic expression of fusions of Dam to DNA-binding proteins is a well-established method used to assess transcription factor occupancy (DNA adenine methyltransferase identification - DamID) (van Steensel and Henikoff, 2000). Recently, it was demonstrated that DamID could be adapted to profile DNA-protein interactions in a cell-type-specific manner by utilising ribosome re-initiation to attenuate transgene expression (Marshall et al., 2016; Aughey and Southall, 2016; Southall et al., 2013). This technique is referred to as Targeted DamID (TaDa). Here, we take advantage of TaDa to express untethered Dam in specific cell types to produce chromatin accessibility profiles in vivo, without the requirement for cell separation. We show that Chromatin Accessibility profiling using Targeted DamID (CATaDa) yields comparable results to both FAIRE and ATAC-seq methods, indicating that it is a reliable and reproducible method for investigating chromatin states. By assaying multiple cell types within a tissue, we show that chromatin accessibility is dynamic throughout the development of Drosophila central nervous system (CNS) and midgut lineages. These data have also enabled us to identify enriched motifs from regulatory elements that dynamically change their accessibility during differentiation, as well as to identify functional cell-type-specific enhancers. Finally, we show that compared to their differentiated progeny, somatic stem cell Dam-methylation signals are more widely distributed across the genome, indicating a greater level of global chromatin accessibility. Results CATaDa produces chromatin accessibility profiles comparable to that of ATAC and FAIRE-seq in Drosophila eye discs We reasoned that low-level expression of transgenic E. coli Dam, using tissue-specific GAL4 drivers in Drosophila, would specifically methylate regions of accessible chromatin exclusively in a cell-type of interest. Detection of these methylated sequences could yield chromatin accessibility profiles for defined cell populations in vivo (Figure 1). To determine if CATaDa produces an accurate reflection of chromatin accessibility, we compared data acquired using this approach with commonly used alternative techniques. A recent study generated ATAC and FAIRE-seq data from Drosophila imaginal eye discs (Davie et al., 2015). Using CATaDa, we expressed E. coli Dam in the eye disc of Drosophila third instar larvae so that we could compare Dam methylation profiles to these previously collected data. Figure 1 Download asset Open asset Schematic illustrating CATaDa technique. (A–B) E. coli Dam is expressed specifically in cell-types of interest using TaDa technique. (C) GATC motifs in regions of accessible chromatin are methylated by Dam, whilst areas of condensed chromatin prevent access to Dam thereby precluding methylation. (D) Methylated DNA is detected to produce chromatin accessibility profiles for individual cell-types of interest from a mixed population of cells. https://doi.org/10.7554/eLife.32341.003 Chromatin accessibility profiles produced with CATaDa in the eye disc were highly reproducible between replicates (r2 = 0.947) (Figure 2—figure supplement 1). CATaDa profiles showed good agreement with data produced with ATAC-seq and FAIRE-seq. Visual inspection of the data showed that many regions of accessible chromatin identified by ATAC and FAIRE are also represented by CATaDa, whilst condensed regions are reliably inaccessible (Figure 2A,B). We also observe that CATaDa profiles exhibited features consistent with chromatin accessibility. For example, open chromatin is enriched at transcriptional start sites (TSS) (Figure 2C). Figure 2 with 3 supplements see all Download asset Open asset Validation of Dam chromatin accessibility profiling compared to ATAC and FAIRE-seq. (A) Chromatin accessibility across chromosome three as determined by ATAC-seq, FAIRE-seq, and CATaDa. Note the reduced amount open chromatin proximal to the centromere regions in all three datasets. y-axes = reads per million (rpm). (B) Example locus showing data obtained by FAIRE, ATAC, and CATaDa. Peaks are broadly reproducible across techniques. (C) Aggregation plot of CATaDa signal at TSS with 2 kb regions up and downstream. Aggregated signal at TSS shows expected enrichment of Dam. (D) Aggregation plot of CATaDa signal at ATAC or FAIRE peaks, indicating enrichment of CATaDa signal at these loci. (E) Identification of ATAC peaks in CATaDa or FAIRE data. CATaDa and FAIRE identify 48.6% and 55.9% of ATAC peaks, respectively. FAIRE-seq peaks overlap more frequently with promoter proximal peaks (2 kb from TSS), whilst CATaDa peaks overlap with more ATAC peaks outside of promoter regions. https://doi.org/10.7554/eLife.32341.004 We observe that CATaDa signal frequency increases dramatically towards the centre of ATAC or FAIRE peaks (Figure 2D).The overlap of Dam identified peaks with ATAC and FAIRE peaks is 48.6% and 49.4%, respectively (In comparison, 55.9% of ATAC peaks are also identified in FAIRE data – Figure 2E). A Monte Carlo simulation determined that this is a highly significant overlap (p<1 × 10−5) and peak heights at shared ATAC and CATaDa peaks show significant correlation (p<1 × 10-16, r2 = 0.138) (Figure 2—figure supplement 2A). We found that increasing the stringency of our peak calling notably decreased the number of peaks identified that coincided with ATAC peaks, but had relatively little impact on unique CATaDa peak discovery (Figure 2—figure supplement 2B). Given these data, we suggest that the majority of these peaks are not false positives, but are genuinely accessible sites that are not detected by ATAC-seq. Further examination of these unique peaks indicates that they are significantly smaller than the shared peaks (Figure 2—figure supplement 2C). We also observe that for peaks identified in either ATAC or FAIRE data that are not present in CATaDa, there is a relative lack of GATC motifs, which suggests that there may be cases in which false negatives are observed due to the limitations of the resolution achievable by Dam methylation (Figure 2—figure supplement 3A–B). To further investigate the differences between CATaDa and ATAC or FAIRE-seq, we investigated the detection of peaks at different genomic features. We found that whilst CATaDa identified fewer peaks than FAIRE in regions proximal to gene promoters (when compared with ATAC), CATaDa was notably better at identification of non-promoter adjacent accessible sites in ATAC data compared to FAIRE-seq (Figure 2E). The lack of promoter peaks identified can again be explained by the relative depletion of GATC sites upstream of TSS (Figure 2—figure supplement 3C). It was previously shown that ATAC-seq and FAIRE-seq data demonstrated high chromatin accessibility at experimentally validated eye-antennal enhancers (Davie et al., 2015). CATaDa profiles similarly showed increased open chromatin at these regions (Figure 3A,B). We found that for 57.9% of FlyLight eye enhancers, a corresponding peak was called in CATaDa profiles (333 of 575 enhancers). CATaDa was comparable to FAIRE-seq and ATAC-seq which identified 48% and 68.7% respectively, of validated FlyLight enhancers as peaks (Figure 3C). Figure 3 Download asset Open asset Identification of validated imaginal disc enhancers with CATaDa. (A) Example loci showing data obtained by FAIRE, ATAC, and CATaDa. Peaks are broadly reproducible across techniques. Flylight enhancers with validated expression in eye imaginal discs coincide with peaks in all three datasets. Corresponding expression pattern is shown in (i) and (ii) (eye disc images obtained from the FlyLight database [http://flweb.janelia.org/cgi-bin/flew.cgi]). (B) Aggregation plot showing average signal of ATAC (blue) and Dam (green) at 575 FlyLight enhancers with validated eye imaginal disc expression. Both techniques show increased open chromatin at these regions. (C) Venn diagram of FlyLight enhancers identified in Dam accessibility profiling, ATAC, or FAIRE-seq. The majority of enhancers identified by either ATAC or FAIRE are also found in the Dam data. Dam enhancers overlap most with ATAC (305 shared between ATAC and Dam of 575 total FlyLight enhancers). https://doi.org/10.7554/eLife.32341.008 CATaDa profiling shows dynamic changes in chromatin accessibility during differentiation of the nervous system In Drosophila, neurons are derived from asymmetrically dividing neural stem cells (NSCs). NSC divisions produce one self-renewing daughter NSC and a ganglion mother cell (GMC), which divides once more to produce neurons or glia (Homem and Knoblich, 2012). To further test the technique and investigate how local and global chromatin accessibility changes during the process of nervous system differentiation, we expressed Dam in specific cells with GAL4 drivers that cover four different developmental stages within the lineage. These include NSCs (worniu- GAL4), GMCs and newly born neurons (R71C09-GAL4 [Figure 4—figure supplement 1B, Li et al., 2014]), differentiated larval neurons (nSyb-GAL4), and also mature adult neurons (nSyb-GAL4) (Figure 4A). Figure 4 with 5 supplements see all Download asset Open asset Chromatin accessibility of cell types in the CNS. (A) Schematic of CNS lineage progression indicating cell types examined in this study. (B) Example profiles resulting from Dam expression in the CNS. Genomic region encompassing Wnt2 and bruchpilot genes is shown. Multiple open chromatin regions are dynamic across development. Y-axes = reads per million (rpm). (C) Clustering of differentially accessible regions in CNS lineages indicates two major groupings in which chromatin is most accessible in either stem cells or mature neurons. (D) Motif analysis using these sequences results in identification of expected motifs (e.g. ase E-box motif in stem cell accessible loci), as well as novel motifs. Most highly enriched motifs for each cluster shown. All motifs E-values < 1 × 10−5. (E) log2 enrichment scores for selected GO terms in individual cell types. Clear trends can be seen as development progresses. (NSC, GMC, L3 neuron, adult neuron - from left to right). (i) GO terms are either enriched in stem cells becoming less significant as the lineage progresses or (ii) vice versa. https://doi.org/10.7554/eLife.32341.009 By examining candidate genes differentially expressed during neural development, we observed that chromatin accessibility relates to gene expression in an expected manner. For example, intronic open chromatin peaks can be seen at the bruchpilot (brp) locus, in both third instar (L3) and adult neurons, whilst these peaks are reduced or absent in the progenitor cell types (Figure 4B). This corresponds with the expression of brp, which is specifically transcribed in neurons and has an important role in synapse function (Wagh et al., 2006). In contrast, the adjacent gene to brp, Wnt2, displays peaks which are most apparent in the NSC and intermediate cell types. Wnt signalling is known to be important for the control of stem cell populations, therefore, these results are also expected (Ring et al., 2014). Similar patterns are observed at a number of other loci. At the asense (ase) locus, (a NSC-specific transcription factor), chromatin is highly accessible at the promoter and upstream intergenic region in NSCs (Figure 4—figure supplement 2B). This signal is considerably reduced in fully differentiated neurons in which ase is not expressed. Interestingly, open chromatin is still detectable in these regions in the GMCs/newly born neurons. This pattern is also observed with other NSC expressed factors such as deadpan (dpn), CyclinE (CycE) and prospero (pros) (Figure 4—figure supplement 2). Furthermore, GMC/newly born neuron profiles frequently show intermediate signal at these loci, indicating that functional elements required for regulation of NSC gene expression are not immediately rendered inaccessible following differentiation (Figure 4B and Figure 4—figure supplement 2). It is to be expected that many of the functional elements marked by accessible chromatin that are important for regulating gene expression in a given neural cell type would show dynamic accessibility across the lineage (i.e. stem cell-specific enhancers would not be expected to be open in mature neurons). We examined regions of differential chromatin accessibility to determine the extent to which chromatin accessibility is changed during development of the nervous system. Hierarchical clustering of regions of chromatin with differential accessibility between cell types reveals two major clusters in which chromatin is either open in stem cells but inaccessible in neurons, or vice versa (Figure 4C). Intriguingly, there are other clusters where maximal chromatin accessibility is observed in either GMCs/early neurons or larval neurons. Therefore, it is not as simple as NSC accessible regions progressively closing during differentiation and neuronal regions gradually opening. There are a large number of loci that are inaccessible in NSCs, then open in the intermediate GMCs/newly born neurons stage before being rendered inaccessible again in terminally differentiated neurons (Figure 4C). In addition, a cluster enriched in larval neurons demonstrates that the chromatin accessibility landscape of larval neurons, although similar, is distinct from adult neurons. Regions of open chromatin are thought to identify functional regulatory elements such as enhancers. Therefore, it is to be expected that these regions will be enriched for motifs belonging to transcription factors involved in neurogenesis. Identification of enriched motifs in sequences that were accessible in NSCs showed that expected transcription factor binding sites were highly enriched. For example, the E-box motif – CAGCNG – which is bound by the NSC proneural factor ase (Figure 4D) (Southall and Brand, 2009; Jarman et al., 1993). Regions in which open chromatin was specifically enriched in mature neurons yielded a sequence motif corresponding to the transcription factor, Ci. In all groups, sequence motifs were also identified for which no known binding partner could be identified (Figure 4—figure supplement 3). Analysis of further subdivision of these clusters revealed yet more novel motifs for the individual cell types examined, as well as indicating that the ase-like motif is specifically enriched for sequences which are accessible solely in the NSCs, and not their progeny (Figure 4—figure supplement 4). Gene ontology (GO) analysis of genes at which enriched chromatin accessibility was observed yielded expected biological process terms for each of the cell types examined (Figure 4—figure supplement 5). For example, terms such as ‘neuroblast fate determination’ and ‘chromosome segregation’ were more highly enriched in stem cells relative to neurons, whilst ‘regulation of behaviour’ and ‘synaptic vesicle docking during exocytosis’ were enriched for differentiated neurons but not NSCs (Figure 4E). Chromatin accessibility in adult midgut cell types Having observed chromatin accessibility changes in the cells of the developing CNS, we asked whether similar patterns would be observed in adult somatic stem cell lineages. The Drosophila midgut contains a pool of cycling intestinal stem cells (ISCs) that persists in the adult to maintain a population of terminally differentiated cells which mediate the absorptive and secretory functions of the organ (Jiang and Edgar, 2011; Nászai et al., 2015). In contrast to neurogenesis, a single committed immature progenitor cell (enteroblast – EB) is produced from stem cell divisions, which then differentiates without further divisions to produce the mature epithelial cells of the midgut (Ohlstein and Spradling, 2007). To examine chromatin accessibility in the cells of the adult midgut, we expressed Dam in the ISCs and EBs, as well as in the terminally differentiated absorptive cells, the enterocytes (ECs)(Figure 5A). Figure 5 with 2 supplements see all Download asset Open asset Dam chromatin accessibility profiling of cells in the adult midgut. (A) Schematic of midgut lineage progression indicating cell types examined in this study. (B) Chromatin accessibility displays expected trends at the escargot locus, known to be expressed exclusively in ISCs and EBs, but not ECs. Upstream promoter region shows greatest chromatin accessibility in ISCs, compared to other cell types. Similarly, dynamic peaks are observed in both 3’ and 5’ distal regions (putative enhancer regions), which are absent in ECs. y-axes = reads per million (rpm). (C) Chromatin accessibility at the nubbin locus, known to be expressed exclusively in ECs. y-axes = reads per million (rpm). (D) Hierarchical clustering of differentially accessible regions in gut cell types. Major clusters are observed in which accessible chromatin is enriched specifically in either ISCs or ECs, whilst smaller clusters indicate fewer regions with up or down-regulated accessibility in EBs. (E) Principal component analysis (mean of all replicates) indicates distinct groupings of both lineages. (F) Correlation matrix (Spearman’s rank) of means of all cells in CNS and midgut lineages. Individual lineages denoted with red outline. Note relatively high correlation between NSC and ISC (Asterisk – R2 = 0.76), whilst NSC correlation with EC and adult neurons are comparable. https://doi.org/10.7554/eLife.32341.016 As with the CNS data, we observed predictable changes in chromatin accessibility at loci for genes with variable expression in the lineage. For example, escargot (esg) a transcription factor required for ISC self-renewal (Korzelius et al., 2014), displays multiple peaks of accessible chromatin at the gene body and surrounding region in ISCs and EBs, whilst little signal is observed in the ECs (Figure 5B). In contrast the nubbin locus (encoding EC marker – Pdm1), displays peaks predominantly in the EC data, with relatively closed chromatin in the progenitor cell types (Figure 5C). As observed in the CNS, hierarchical clustering revealed two major groups in which accessible chromatin was enriched in either in the stem cells (ISCs) or differentiated cell (ECs) (Figure 5D). Smaller clusters were again evident in which accessible chromatin was up or downregulated exclusively in the intermediate EBs. However, this was much less pronounced than the changes observed in GMCs/early neurons of the developing CNS. This indicates that, similar to the CNS lineages, the majority of chromatin accessibility changes involved in specifying the fully differentiated cells do not occur until after EB maturation. As with the cells of the CNS, we were able to identify motifs specifically enriched in each of these groups (Figure 5—figure supplement 1). ISCs and NSCs fulfil similar roles in their respective organs in the production of highly specialised functional cells. However, whilst NSCs exist for a short amount of time during fly development to produce relatively long-lived neurons that persist in the adult CNS for the animal’s lifetime, the ISCs act post-developmentally to constantly replenish ECs in the adult gut. By comparing the chromatin accessibility of these two cell types, it is apparent that there are similarities in their chromatin states. At loci involved in growth or cell division, we see similar accessibility profiles across differentiation between the two tissue types (Figure 5—figure supplement 1). Given the similarities that we observed for individual loci between CNS and midgut lineages, we queried whether it was possible to observe trends between the cells in the two lineages on a global scale. Principal component analysis reveals two distinct clusters in which >80% of the variance is explained in the first two principal components (Figure 5E). These clusters represent the two distinct tissue types, (CNS and midgut) rather than immature and differentiated cells. By examining the overall correlation between all cell types we observed a number of interesting features. Firstly, as expected all cell types correlated most closely with either their direct progeny or progenitor cell (Figure 5F). Therefore by clustering the data we were able to recapitulate the familial relationship between the cell types of the two lineages. The greatest similarities were observed between the intermediate progenitors and their cognate stem cells (R2 = 0.94/0.98 for CNS and midgut respectively). Interestingly, the greatest correlation outside of a lineage was between the two stem cell types (R2 = 0.76), whilst differentiated cells exhibited only weak correlation (ISCs vs NSC, R2 = 0.51). This indicates that somatic stem cell types may utilise a broadly similar chromatin landscape for the maintenance of multipotency, whilst lineage-specific variation is relatively small. Enhancer prediction from Dam accessibility data Enhancer activity is closely linked to gene expression, therefore, many tissue-specific enhancers are required to orchestrate correct spatial and temporal transcription (Pennacchio et al., 2013). However, identification of functional enhancers can be challenging. Chromatin accessibility data have previously been used to identify novel enhancers (Davie et al., 2015; Crawford et al., 2006). We reasoned that it would be possible to identify genomic regions corresponding to cell-type- specific enhancers by comparing dynamically accessible regions between cell types. In support of this, we observed that the sequence covered by the 71C09-GAL4 line used in this study to profile GMCs/newly born neurons, displayed a higher peak specifically at this region than in either the stem cell or differentiated neuron data (Figure 4—figure supplement 1). Interestingly, a clear peak can still be observed in the NSC data, without concomitant reporter expression. Therefore, an enrichment of accessible chromatin does not necessarily correspond to an active enhancer in a given cell type. This is consistent with previous observations that DNase hypersensitive regions are often not active enhancers (Zhou et al., 2017; Thurman et al., 2012). We selected accessible regions with large differences between at least two cell types in the lineage, which satisfied various criteria for us to designate them as putative enhancers (see Materials and methods). We then identified available reporter lines from the Vienna tiles (VT)(Kvon et al., 2014) and FlyLight (Jenett et al., 2012) collections of GAL4 reporter lines that contained sequences encompassing our predicted enhancers upstream of a GAL4 reporter, and verified their expression in the tissues of interest. We identified enhancer-GAL4 lines in which reporter expression matched our predictions for enhancer activity. In the CNS Vienna line VT017417 and FlyLight line GMR56E07 both showed expression in the early part of the lineage in the CNS, with GFP reporter expression detectable predominantly in NSCs and GMCs (Figure 6A,B). This is consistent with accessible chromatin readings from our CATaDa data for these cell types in which progenitor cells displayed prominent peaks, whereas differentiated neurons did not. Similarly, we were able to detect functional cell-type-specific enhancers in the midgut. The Vienna line, VT004241, showed reporter expression predominantly in Delta positive ISCs (Figure 6C). Therefore, it is possible to use CATaDa data to identify novel cell-type-specific enhancers

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call