Abstract

Identifying DNA cis-regulatory modules (CRMs) that control the expression of specific genes is crucial for deciphering the logic of transcriptional control. Natural genetic variation can point to the possible gene regulatory function of specific sequences through their allelic associations with gene expression. However, comprehensive identification of causal regulatory sequences in brute-force association testing without incorporating prior knowledge is challenging due to limited statistical power and effects of linkage disequilibrium. Sequence variants affecting transcription factor (TF) binding at CRMs have a strong potential to influence gene regulatory function, which provides a motivation for prioritizing such variants in association testing. Here, we generate an atlas of CRMs showing predicted allelic variation in TF binding affinity in human lymphoblastoid cell lines and test their association with the expression of their putative target genes inferred from Promoter Capture Hi-C and immediate linear proximity. We reveal >1300 CRM TF-binding variants associated with target gene expression, the majority of them undetected with standard association testing. A large proportion of CRMs showing associations with the expression of genes they contact in 3D localize to the promoter regions of other genes, supporting the notion of ‘epromoters’: dual-action CRMs with promoter and distal enhancer activity.

Highlights

  • Identifying DNA cis-regulatory modules (CRMs) that control the expression of specific genes is crucial for deciphering the logic of transcriptional control and its aberrations

  • We have generated an atlas of CRM variants predicted to affect transcription factor (TF) binding in lymphoblastoid cell line (LCL) and established their associations with the expression of their putative target genes

  • The key methodological innovations of our work are the prioritization and pooling of variants at CRM level using a biophysical model of TF binding affinity, as well as the prioritization of CRM target genes based on highresolution Promoter Capture Hi-C (PCHi-C) data

Read more

Summary

Introduction

Identifying DNA cis-regulatory modules (CRMs) that control the expression of specific genes is crucial for deciphering the logic of transcriptional control and its aberrations. While expression quantitative trait loci (eQTLs) identified this way have provided important insights into gene control and the mechanisms of specific diseases [13,14], a number of challenges hamper comprehensive detection of functional sequences in ‘brute-force’ eQTL testing [15,16]. The immense search space leads to a heavy multiple testing burden resulting in reduced sensitivity. This problem is typically mitigated in part by testing for ‘cis-eQTLs’ separately within a limited distance window (∼100 kb); this distance range is, an order of magnitude shorter than that of known distal CRM activity [17,18,19]. Correlation structure arising from linkage disequilibrium (LD) requires dis-

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call