Csaw: a Bioconductor package for differential binding analysis of ChIP-seq data using sliding windows.

Aaron T.L Lun,Gordon K Smyth

doi:10.1093/nar/gkv1191

Abstract

Chromatin immunoprecipitation with massively parallel sequencing (ChIP-seq) is widely used to identify binding sites for a target protein in the genome. An important scientific application is to identify changes in protein binding between different treatment conditions, i.e. to detect differential binding. This can reveal potential mechanisms through which changes in binding may contribute to the treatment effect. The csaw package provides a framework for the de novo detection of differentially bound genomic regions. It uses a window-based strategy to summarize read counts across the genome. It exploits existing statistical software to test for significant differences in each window. Finally, it clusters windows into regions for output and controls the false discovery rate properly over all detected regions. The csaw package can handle arbitrarily complex experimental designs involving biological replicates. It can be applied to both transcription factor and histone mark datasets, and, more generally, to any type of sequencing data measuring genomic coverage. csaw performs favorably against existing methods for de novo DB analyses on both simulated and real data. csaw is implemented as a R software package and is freely available from the open-source Bioconductor project.

Highlights

The ChIP-seq technique identifies protein–DNA interactions by massively parallel sequencing of DNA bound to a target protein
Loss of control is still observed in a special simulation where the tests used by DiffBind and DBChIP should have optimal performance
This is due to the lack of independence between peak calling and differential binding (DB) detection [12]

Summary

Introduction

The ChIP-seq technique identifies protein–DNA interactions by massively parallel sequencing of DNA bound to a target protein. Traditional analyses of ChIP-seq data involve identifying peaks of high read density in the genome, using software like MACS [1], HOMER [2] or SICER [3]. These peaks represent putative binding sites for the target protein. Binding sites are considered present or absent in each sample, allowing qualitative comparisons between DNA samples or experimental conditions. Strongly bound sites detected by peak calling may not necessarily be biologically interesting if the intensity of binding does not change between treatment conditions

Methods

Results

Conclusion