Abstract

HiChIP and PLAC-Seq are emerging technologies for studying genome-wide long-range chromatin interactions mediated by the protein of interest, enabling more sensitive and cost-efficient interrogation of protein-centric chromatin conformation. However, due to the unbalanced read distribution introduced by protein immunoprecipitation, existing reproducibility measures developed for Hi-C data are not appropriate for the analysis of HiChIP and PLAC-Seq data. Here, we present HPRep, a stratified and weighted correlation metric derived from normalized contact counts, to quantify reproducibility in HiChIP and PLAC-Seq data. We applied HPRep to multiple real datasets and demonstrate that HPRep outperforms existing reproducibility measures developed for Hi-C data. Specifically, we applied HPRep to H3K4me3 PLAC-Seq data from mouse embryonic stem cells and mouse brain tissues as well as H3K27ac HiChIP data from human lymphoblastoid cell line GM12878 and leukemia cell line K562, showing that HPRep can more clearly separate among pseudo-replicates, real replicates, and non-replicates. Furthermore, in an H3K4me3 PLAC-Seq dataset consisting of 11 samples from four human brain cell types, HPRep demonstrated the expected clustering of data that could not be achieved by existing methods developed for Hi-C data, highlighting the need for a reproducibility metric tailored to HiChIP and PLAC-Seq data.

Highlights

  • Chromatin spatial organization plays a critical role in genome structure and transcriptional regulation [1,2,3]

  • To fill in this gap, we propose a novel method, HPRep, to measure the similarity or reproducibility between two HP datasets

  • Quantification of data reproducibility is critical to ensure scientific rigor, methods tailored for HiChIP and PLAC-Seq data are still lacking

Read more

Summary

Introduction

Chromatin spatial organization plays a critical role in genome structure and transcriptional regulation [1,2,3]. HiCRep [6] first performs 2D smoothing to reduce the stochastic noise resulting from the sparsity of Hi-C data, and quantifies reproducibility by calculating a stratified correlation, which is a weighted average of correlation coefficients between contact frequencies across specific one-dimensional (1D) genomic distance bands. QuASAR-Rep [9] determines a local correlation matrix by comparing observed interaction counts to background signal–distance values within a specified distance. This local correlation matrix is subsequently transformed by element-wise multiplication with a matrix of scaled interaction counts. The reproducibility between two samples is defined as the Pearson correlation coefficient between the corresponding transformed matrices

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call