Abstract

DNase I hypersensitive sites (DHSs) are generic markers of regulatory DNA1–5 and contain genetic variations associated with diseases and phenotypic traits6–8. We created high-resolution maps of DHSs from 733 human biosamples encompassing 438 cell and tissue types and states, and integrated these to delineate and numerically index approximately 3.6 million DHSs within the human genome sequence, providing a common coordinate system for regulatory DNA. Here we show that these maps highly resolve the cis-regulatory compartment of the human genome, which encodes unexpectedly diverse cell- and tissue-selective regulatory programs at very high density. These programs can be captured comprehensively by a simple vocabulary that enables the assignment to each DHS of a regulatory barcode that encapsulates its tissue manifestations, and global annotation of protein-coding and non-coding RNA genes in a manner orthogonal to gene expression. Finally, we show that sharply resolved DHSs markedly enhance the genetic association and heritability signals of diseases and traits. Rather than being confined to a small number of distal elements or promoters, we find that genetic signals converge on congruently regulated sets of DHSs that decorate entire gene bodies. Together, our results create a universal, extensible coordinate system and vocabulary for human regulatory DNA marked by DHSs, and provide a new global perspective on the architecture of human gene regulation.

Highlights

  • The advent of genome-scale mapping of DHSs12–15 and its application to diverse human and mouse cell and tissue types[16,17] has yielded many insights into the organization[16], evolution[17,18,19], activity[15,16,20], and function[16,21,22] of human regulatory DNA in both normal and malignant states[23]

  • A cardinal property of regulatory DNA is that its accessibility is cell type- and state-selective, with only a small fraction of all genome-encoded elements becoming actuated in a given cellular context[16,23]

  • The overwhelming majority of disease- and trait-associated variants identified by genome-wide association studies (GWASs) lie in non-coding regions of the genome, and these variants are most strongly enriched in DHSs mapped in disease-relevant cell contexts[6,7]

Read more

Summary

A C C G GA T CT C

Component-matching tissue and cell types (156(2) ,586(4) ,267) (383) (798) (36(120),192()8,346(1) ,670(2) ,345(2) ,345) (564(2),014) (10) (40) (Number of used expression datasets indicated). We identified 189,318 DHSs genome-wide (per TF median 149, IQR 47–477 DHSs) that (i) were exclusively annotated by a component matching that of the TF gene, and (ii) showed occupancy of the cognate motif by footprinting[25] in a component-matched biosample (Fig. 3h) Such DHSs are likely to be highly functionally dependent on their associated TF, and should provide a rich substrate for experimental manipulations to investigate connections between TFs and regulatory functions. Concordant DHSs strongly contributed to SNP-based trait heritability relative to DHSs that were found in the same genes but with component annotations discordant with the annotations of the underlying gene, despite having lower average DNase-seq signal levels (Extended Data Fig. 9g) and more specialized utilization patterns (occurring in an average of 15 versus 25 biosamples). Rather than being confined to a small number of distal elements or promoters, it appears that genetic association signals are concentrated within congruently regulated sets of DHSs that decorate entire gene bodies

Discussion
Methods
Code availability
Findings
A CT GT C G
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call