ChIPnorm: A Statistical Method for Normalizing and Identifying Differential Regions in Histone Modification ChIP-seq Libraries

Nishanth Ulhas Nair,Philipp Bucher,Avinash Das Sahu,Bernard M. E. Moret,Leonardo Mariño-Ramírez

doi:10.1371/journal.pone.0039573

Abstract

The advent of high-throughput technologies such as ChIP-seq has made possible the study of histone modifications. A problem of particular interest is the identification of regions of the genome where different cell types from the same organism exhibit different patterns of histone enrichment. This problem turns out to be surprisingly difficult, even in simple pairwise comparisons, because of the significant level of noise in ChIP-seq data. In this paper we propose a two-stage statistical method, called ChIPnorm, to normalize ChIP-seq data, and to find differential regions in the genome, given two libraries of histone modifications of different cell types. We show that the ChIPnorm method removes most of the noise and bias in the data and outperforms other normalization methods. We correlate the histone marks with gene expression data and confirm that histone modifications H3K27me3 and H3K4me3 act as respectively a repressor and an activator of genes. Compared to what was previously reported in the literature, we find that a substantially higher fraction of bivalent marks in ES cells for H3K27me3 and H3K4me3 move into a K27-only state. We find that most of the promoter regions in protein-coding genes have differential histone-modification sites. The software for this work can be downloaded from http://lcbb.epfl.ch/software.html.

Highlights

Histones are proteins that package the DNA into chromosomes [1]
embryonic stem (ES) data has better S/N ratio as well as more peaks in gene-rich regions than in gene-poor regions. These characteristics introduce a bias that must be eliminated before comparing ES data to neural progenitor (NP) data, as can be seen in the results of the ChIPDiff method [9] in the same figure: most of the differentially NP enriched regions proposed by ChIPDiff fall within gene-poor regions and are almost certainly false positives
We have presented an approach for the analysis of chromatin immunoprecipitation (ChIP)-seq data, with particular emphasis on the discovery of differentially enriched histone-modification sites

Summary

Introduction

Histones are proteins that package the DNA into chromosomes [1] They are subjected to various types of modifications like methylation, acetylation, phosphorylation, and ubiquitination, which alter their interaction with the DNA and nuclear proteins, thereby influencing transcription and genomic function. These modifications form an important category of epigenetic changes, changes that help us understand why various types of cells exhibit very different behaviors in spite of their shared genome. Thanks to advances in sequencing technologies, ChIP-seq has become the main approach for capturing histone modifications, due to its high throughput, high resolution, and low cost [5,6,7]. In the ChIP-seq process, the sequence of one end of the DNA fragment is read to provide a tag which is mapped to an assembled genome to determine the location of the DNA fragment

Objectives

Methods

Results

Conclusion