Abstract
HiChIP and PLAC-Seq are emerging technologies for studying genome-wide long-range chromatin interactions mediated by the protein of interest, enabling more sensitive and cost-efficient interrogation of protein-centric chromatin conformation. However, due to the unbalanced read distribution introduced by protein immunoprecipitation, existing reproducibility measures developed for Hi-C data are not appropriate for the analysis of HiChIP and PLAC-Seq data. Here, we present HPRep, a stratified and weighted correlation metric derived from normalized contact counts, to quantify reproducibility in HiChIP and PLAC-Seq data. We applied HPRep to multiple real datasets and demonstrate that HPRep outperforms existing reproducibility measures developed for Hi-C data. Specifically, we applied HPRep to H3K4me3 PLAC-Seq data from mouse embryonic stem cells and mouse brain tissues as well as H3K27ac HiChIP data from human lymphoblastoid cell line GM12878 and leukemia cell line K562, showing that HPRep can more clearly separate among pseudo-replicates, real replicates, and non-replicates. Furthermore, in an H3K4me3 PLAC-Seq dataset consisting of 11 samples from four human brain cell types, HPRep demonstrated the expected clustering of data that could not be achieved by existing methods developed for Hi-C data, highlighting the need for a reproducibility metric tailored to HiChIP and PLAC-Seq data.
Highlights
Chromatin spatial organization plays a critical role in genome structure and transcriptional regulation [1,2,3]
To fill in this gap, we propose a novel method, HPRep, to measure the similarity or reproducibility between two HP datasets
Quantification of data reproducibility is critical to ensure scientific rigor, methods tailored for HiChIP and PLAC-Seq data are still lacking
Summary
Chromatin spatial organization plays a critical role in genome structure and transcriptional regulation [1,2,3]. HiCRep [6] first performs 2D smoothing to reduce the stochastic noise resulting from the sparsity of Hi-C data, and quantifies reproducibility by calculating a stratified correlation, which is a weighted average of correlation coefficients between contact frequencies across specific one-dimensional (1D) genomic distance bands. QuASAR-Rep [9] determines a local correlation matrix by comparing observed interaction counts to background signal–distance values within a specified distance. This local correlation matrix is subsequently transformed by element-wise multiplication with a matrix of scaled interaction counts. The reproducibility between two samples is defined as the Pearson correlation coefficient between the corresponding transformed matrices
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.