Abstract

BackgroundHi-C and capture Hi-C (CHi-C) are used to map physical contacts between chromatin regions in cell nuclei using high-throughput sequencing. Analysis typically proceeds considering the evidence for contacts between each possible pair of fragments independent from other pairs. This can produce long runs of fragments which appear to all make contact with the same baited fragment of interest.ResultsWe hypothesised that these long runs could result from a smaller subset of direct contacts and propose a new method, based on a Bayesian sparse variable selection approach, which attempts to fine map these direct contacts. Our model is conceptually novel, exploiting the spatial pattern of counts in CHi-C data. Although we use only the CHi-C count data in fitting the model, we show that the fragments prioritised display biological properties that would be expected of true contacts: for bait fragments corresponding to gene promoters, we identify contact fragments with active chromatin and contacts that correspond to edges found in previously defined enhancer-target networks; conversely, for intergenic bait fragments, we identify contact fragments corresponding to promoters for genes expressed in that cell type. We show that long runs of apparently co-contacting fragments can typically be explained using a subset of direct contacts consisting of <10% of the number in the full run, suggesting that greater resolution can be extracted from existing datasets.ConclusionsOur results appear largely complementary to those from a per-fragment analytical approach, suggesting that they provide an additional level of interpretation that may be used to increase resolution for mapping direct contacts in CHi-C experiments.

Highlights

  • Hi-C and capture Hi-C (CHi-C) are used to map physical contacts between chromatin regions in cell nuclei using high-throughput sequencing

  • It is accepted in genome-wide association studies (GWAS) that the p value corresponding to a test of association between a single genetic variant and some phenotype should be interpreted in the context of the p values of its neighbours, either by highlighting the variant in the region with the smallest p value, or by fitting a variable selection model to find a sparse subset of variants which could explain the association signals across the region

  • The primary difference between our CHi-C model and the class of GWAS fine mapping models that fit the association statistics directly (e.g. PAINTOR, [18]) is that the decay of GWAS association signals across genetic variants has been established to relate to the linkage disequilibrium or correlation between those variants within the population, while our model assumes an exponential decay specified by a single parameter ω

Read more

Summary

Objectives

Because our aim was to fit this model to 28,214 baits, we took some time to define a strategy for thinning these samples in order to perform reliable posterior inference on γb while minimising the computational burden (Additional file 1)

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call