Abstract

BackgroundChIP-Seq is widely used to detect genomic segments bound by transcription factors (TF), either directly at DNA binding sites (BSs) or indirectly via other proteins. Currently, there are many software tools implementing different approaches to identify TFBSs within ChIP-Seq peaks. However, their use for the interpretation of ChIP-Seq data is usually complicated by the absence of direct experimental verification, making it difficult both to set a threshold to avoid recognition of too many false-positive BSs, and to compare the actual performance of different models.ResultsUsing ChIP-Seq data for FoxA2 binding loci in mouse adult liver and human HepG2 cells we compared FoxA binding-site predictions for four computational models of two fundamental classes: pattern matching based on existing training set of experimentally confirmed TFBSs (oPWM and SiteGA) and de novo motif discovery (ChIPMunk and diChIPMunk). To properly select prediction thresholds for the models, we experimentally evaluated affinity of 64 predicted FoxA BSs using EMSA that allows safely distinguishing sequences able to bind TF. As a result we identified thousands of reliable FoxA BSs within ChIP-Seq loci from mouse liver and human HepG2 cells. It was found that the performance of conventional position weight matrix (PWM) models was inferior with the highest false positive rate. On the contrary, the best recognition efficiency was achieved by the combination of SiteGA & diChIPMunk/ChIPMunk models, properly identifying FoxA BSs in up to 90% of loci for both mouse and human ChIP-Seq datasets.ConclusionsThe experimental study of TF binding to oligonucleotides corresponding to predicted sites increases the reliability of computational methods for TFBS-recognition in ChIP-Seq data analysis. Regarding ChIP-Seq data interpretation, basic PWMs have inferior TFBS recognition quality compared to the more sophisticated SiteGA and de novo motif discovery methods. A combination of models from different principles allowed identification of proper TFBSs.Electronic supplementary materialThe online version of this article (doi:10.1186/1471-2164-15-80) contains supplementary material, which is available to authorized users.

Highlights

  • chromatin immunoprecipitation (ChIP)-Seq is widely used to detect genomic segments bound by transcription factors (TF), either directly at DNA binding sites (BSs) or indirectly via other proteins

  • The major class of these elements is represented by transcription factor (TF) binding sites (TFBSs), short DNA segments of 10-20 bp recognized by TFs

  • To produce a subset of data for experimental verification we restricted the search to FoxA2-binding loci that overlapped with 1 kb upstream regions of RefSeq genes and had coverage at least 15 (301 promoters)

Read more

Summary

Introduction

ChIP-Seq is widely used to detect genomic segments bound by transcription factors (TF), either directly at DNA binding sites (BSs) or indirectly via other proteins. The major class of these elements is represented by transcription factor (TF) binding sites (TFBSs), short DNA segments of 10-20 bp recognized by TFs. Modern high-throughput techniques, such as chromatin immunoprecipitation (ChIP) followed by microarray hybridization (ChIP-chip) or by massively parallel sequencing (ChIP-Seq), allow genome-scale mapping of TF occupancy in a given cell type and state [1]. Thousands of binding loci for a large number of TFs have been revealed for various cell types [2] Both ChIP-Seq and ChIP-chip technologies are not able to distinguish direct TF binding to DNA from indirect binding mediated by other chromatin proteins including other TFs bound to cognate DNA sites (the so-called tethered or “piggy back” binding) [1,3]. ChIP-Seq identifies exact locations of TFBSs only indirectly and cannot discriminate between closely spaced multiple sites within DNA segments of hundreds of base pairs [4]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call