Abstract
A growing amount of evidence in literature suggests that germline sequence variants and somatic mutations in non-coding distal regulatory elements may be crucial for defining disease risk and prognostic stratification of patients, in genetic disorders as well as in cancer. Their functional interpretation is challenging because genome-wide enhancer–target gene (ETG) pairing is an open problem in genomics. The solutions proposed so far do not account for the hierarchy of structural domains which define chromatin three-dimensional (3D) architecture. Here we introduce a change of perspective based on the definition of multi-scale structural chromatin domains, integrated in a statistical framework to define ETG pairs. In this work (i) we develop a computational and statistical framework to reconstruct a comprehensive map of ETG pairs leveraging functional genomics data; (ii) we demonstrate that the incorporation of chromatin 3D architecture information improves ETG pairing accuracy and (iii) we use multiple experimental datasets to extensively benchmark our method against previous solutions for the genome-wide reconstruction of ETG pairs. This solution will facilitate the annotation and interpretation of sequence variants in distal non-coding regulatory elements. We expect this to be especially helpful in clinically oriented applications of whole genome sequencing in cancer and undiagnosed genetic diseases research.
Highlights
IntroductionDistal non-coding regulatory elements (enhancers) are crucial players in the control of gene expression
Distal non-coding regulatory elements are crucial players in the control of gene expression
We present a general framework for the definition of enhancer–target gene (ETG) pairs leveraging the current biological knowledge on chromatin 3D architecture and integrating heterogeneous functional genomics data into a rigorous statistical framework
Summary
Distal non-coding regulatory elements (enhancers) are crucial players in the control of gene expression These are the genomic features carrying the most marked epigenetic differences across cell types, constituting a fundamental component of the molecular and genetic mechanisms defining cell identity [1,2]. The formation of chromatin loops allows distal regulatory regions to come in close physical proximity to their target gene promoters to regulate transcription [4]. Their importance for human physiology is attested by their enrichment in polymorphisms associated to genetic diseases and cancer risk [5,6]. It would be instrumental for the annotation and interpretation of non-coding somatic mutations or germline sequence variants, to understand their effect on the broader gene regulatory network, in basic biology as well as in more translational studies
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have