Abstract
Chromatin immunoprecipitation followed by massively parallel, high throughput sequencing (ChIP-seq) is the method of choice for genome-wide identification of DNA segments bound by specific transcription factors or in chromatin with particular histone modifications. However, the quality of ChIP-seq datasets varies widely, with a substantial fraction being of intermediate to poor quality. Thus, it is important to discern and control the factors that contribute to variation in ChIP-seq. In this study, we focused on sonication, a user-controlled variable, to produce sheared chromatin. We systematically varied the amount of shearing of fixed chromatin from a mouse erythroid cell line, carefully measuring the distribution of resultant fragment lengths prior to ChIP-seq. This systematic study was complemented with a retrospective analysis of additional experiments. We found that the level of sonication had a pronounced impact on the quality of ChIP-seq signals. Over-sonication consistently reduced quality, while the impact of under-sonication differed among transcription factors, with no impact on sites bound by CTCF but frequently leading to the loss of sites occupied by TAL1 or bound by POL2. The bound sites not observed in low-quality datasets were inferred to be a mix of both direct and indirect binding. We leveraged these findings to produce a set of CTCF ChIP-seq datasets in rare, primary hematopoietic progenitor cells. Our observation that the amount of chromatin sonication is a key variable in success of ChIP-seq experiments indicates that monitoring the level of sonication can improve ChIP-seq quality and reproducibility and facilitate ChIP-seq in rare cell types.
Highlights
Chromatin immunoprecipitation followed by massively parallel, high throughput sequencing (ChIP-seq) has been used extensively to produce thousands of genomewide maps of DNA segments bound by specific transcription factors (TFs) or in chromatin with particular histone modifications
As expected, the chromatin size is inversely proportional to the numbers of cycles of sonication that were used for generating both the CTCF Chromatin Immunoprecipitation (ChIP)-seq datasets (R= -0.97 for 50M cells, R= -0.93 for 20M, Figure 2A) and TAL1 ChIP-seq datasets (R= -0.94 for 50M cells, R= -0.85 for 20M, Figure 2B)
A total of 12 retrospective datasets with independent chromatin sonications met these criteria. These ChIP-seq experiments were conducted over a variety of conditions, including different fixation methods, which confound our assessment of quality, but they provided an opportunity to determine whether a relationship between chromatin size and quality could still be detected. After assessing their quality by both by subjective inspection of ChIP-seq signals and by objective ENCODE quality metrics (Supplemental Table S2), we found that very low average chromatin size, a Fraction of Reads in Peaks (FRiP) score below 1%, or a low Relative Strand Correlation (RSC) were associated with failure of the ChIP-seq experiment (Figure 4)
Summary
Chromatin immunoprecipitation followed by massively parallel, high throughput sequencing (ChIP-seq) has been used extensively to produce thousands of genomewide maps of DNA segments bound by specific transcription factors (TFs) or in chromatin with particular histone modifications. These ChIP-seq datasets vary widely in quality. For the 7547 ChIP-seq datasets across four species (as of July 14, 2020), about 300 red flags and about 3000 orange flags were given (multiple flags can be assigned to one dataset), illustrating a serious, but not crippling, issue These studies of reproducibility and data quality all illustrate the variable quality in ChIPseq datasets. It is important to discern and control the factors that contribute to variation in ChIP-seq
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have