Abstract

BackgroundChromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) can locate transcription factor binding sites on genomic scale. Although many models and programs are available to call peaks, none has dominated its competition in comparison studies.ResultsWe propose a rigorous statistical model, the normal-exponential two-peak (NEXT-peak) model, which parallels the physical processes generating the empirical data, and which can naturally incorporate mappability information. The model therefore estimates total strength of binding (even if some binding locations do not map uniquely into a reference genome, effectively censoring them); it also assigns an error to an estimated binding location. The comparison study with existing programs on real ChIP-seq datasets (STAT1, NRSF, and ZNF143) demonstrates that the NEXT-peak model performs well both in calling peaks and locating them. The model also provides a goodness-of-fit test, to screen out spurious peaks and to infer multiple binding events in a region.ConclusionsThe NEXT-peak program calls peaks on any test dataset about as accurately as any other, but provides unusual accuracy in the estimated location of the peaks it calls. NEXT-peak is based on rigorous statistics, so its model also provides a principled foundation for a more elaborate statistical analysis of ChIP-seq data.

Highlights

  • Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) can locate transcription factor binding sites on genomic scale

  • We considered every peak called within 250 bp of a candidate site to be a true positive (TP)

  • The NEXT-peak program tests the validity of its underlying NEXT-peak model without depending on an unrealistic assumption of a global uniform background tag distribution

Read more

Summary

Introduction

Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) can locate transcription factor binding sites on genomic scale. ChIP-seq experiments use chromatin immunoprecipitation and high-throughput sequencing, primarily to locate transcription factor binding sites across entire genomes, and to better our understanding of biological control systems [1]. As a brief overview of the relevant experimental protocols, they begin by irreversibly crosslinking a transcription factor (TF) molecule to its binding site in genomic DNA. They shear the DNA into millions of short sequence fragments. Antibodies to the TF precipitate each TF molecule along with its attached fragment. Fragments are dissociated from the TF molecules, amplified by polymerase

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call