Abstract

As next generation sequencing technologies are becoming more economical, large-scale ChIP-seq studies are enabling the investigation of the roles of transcription factor binding and epigenome on phenotypic variation. Studying such variation requires individual level ChIP-seq experiments. Standard designs for ChIP-seq experiments employ a paired control per ChIP-seq sample. Genomic coverage for control experiments is often sacrificed to increase the resources for ChIP samples. However, the quality of ChIP-enriched regions identifiable from a ChIP-seq experiment depends on the quality and the coverage of the control experiments. Insufficient coverage leads to loss of power in detecting enrichment. We investigate the effect of in silico pooling of control samples within multiple biological replicates, multiple treatment conditions, and multiple cell lines and tissues across multiple datasets with varying levels of genomic coverage. Our computational studies suggest guidelines for performing in silico pooling of control experiments. Using vast amounts of ENCODE data, we show that pairwise correlations between control samples originating from multiple biological replicates, treatments, and cell lines/tissues can be grouped into two classes representing whether or not in silico pooling leads to power gain in detecting enrichment between the ChIP and the control samples. Our findings have important implications for multiplexing samples.

Highlights

  • Control experiments such as input DNA or ChIP with a non-specific antibody (e.g., IgG anti-serum) are commonly used to estimate background read distribution and are shown to be critical for identifying enriched regions in ChIP-seq experiments [1,2,3,4]

  • This practice typically limits the sequencing depths of control samples since resources are divided between ChIP and control samples and investigators are left wondering whether they have sufficient genomic coverage for their control samples

  • We investigate in silico pooling designs, powered by the recent developments in sequencing technology, e.g., HiSeq 2000 of Illumina and multiplexing, for individually sequenced control samples

Read more

Summary

Introduction

Control experiments such as input DNA (whole cell extract) or ChIP with a non-specific antibody (e.g., IgG anti-serum) are commonly used to estimate background read distribution and are shown to be critical for identifying enriched regions in ChIP-seq experiments [1,2,3,4].

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call