Abstract

High-occupancy target (HOT) regions are segments of the genome with unusually high number of transcription factor binding sites. These regions are observed in multiple species and thought to have biological importance due to high transcription factor occupancy. Furthermore, they coincide with house-keeping gene promoters and consequently associated genes are stably expressed across multiple cell types. Despite these features, HOT regions are solely defined using ChIP-seq experiments and shown to lack canonical motifs for transcription factors that are thought to be bound there. Although, ChIP-seq experiments are the golden standard for finding genome-wide binding sites of a protein, they are not noise free. Here, we show that HOT regions are likely to be ChIP-seq artifacts and they are similar to previously proposed ‘hyper-ChIPable’ regions. Using ChIP-seq data sets for knocked-out transcription factors, we demonstrate presence of false positive signals on HOT regions. We observe sequence characteristics and genomic features that are discriminatory of HOT regions, such as GC/CpG-rich k-mers, enrichment of RNA–DNA hybrids (R-loops) and DNA tertiary structures (G-quadruplex DNA). The artificial ChIP-seq enrichment on HOT regions could be associated to these discriminatory features. Furthermore, we propose strategies to deal with such artifacts for the future ChIP-seq studies.

Highlights

  • Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is a standard method to quantitatively assay the binding sites of a DNA binding protein in the genome

  • high-occupancy target (HOT) regions exist in multiple species and cover transcription start sites (TSS) of stably expressed genes across cell types

  • HOT regions are observed in multiple species––human [3,20], D. melanogaster [21], yeast [22] and C. elegans [20,23]

Read more

Summary

Introduction

Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is a standard method to quantitatively assay the binding sites of a DNA binding protein in the genome. Large scale projects such as ENCODE [1] and modENCODE [2] used this technology to find the binding sites of hundreds of proteins in multiple species. With more binding site data available, it has become apparent that certain parts of the genome harbour high frequency of protein-DNA binding events These regions are called high-occupancy target (HOT) regions and they are observed in multiple species [3,4]. HOT regions are thought to have biological importance due to high number of binding sites observed, but previous reports failed to assign a clearly distinctive function that would explain the requirement for the exuberant number of bound transcription factors

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call