Abstract

We present a mixture model-based analysis for identifying differences in the distribution of RNA polymerase II (Pol II) in transcribed regions, measured using ChIP-seq (chromatin immunoprecipitation following massively parallel sequencing technology). The statistical model assumes that the number of Pol II-targeted sequences contained within each genomic region follows a Poisson distribution. A Poisson mixture model was then developed to distinguish Pol II binding changes in transcribed region using an empirical approach and an expectation-maximization (EM) algorithm developed for estimation and inference. In order to achieve a global maximum in the M-step, a particle swarm optimization (PSO) was implemented. We applied this model to Pol II binding data generated from hormone-dependent MCF7 breast cancer cells and antiestrogen-resistant MCF7 breast cancer cells before and after treatment with 17β-estradiol (E2). We determined that in the hormone-dependent cells, ~9.9% (2527) genes showed significant changes in Pol II binding after E2 treatment. However, only ~0.7% (172) genes displayed significant Pol II binding changes in E2-treated antiestrogen-resistant cells. These results show that a Poisson mixture model can be used to analyze ChIP-seq data.

Highlights

  • Parallel sequencing is a high-throughput technology capable of sequencing hundreds of thousands of DNA fragments in a single experiment

  • A Poisson mixture model to identify transcripts with different polymerase II (Pol II) binding quantity Based on the assumption that the number of Pol II-binding fragments detected within gene transcript region, including 5'- and 3'-untranslated regions and open reading frames, follows a Poisson distribution, we developed a mixture model to identify differences in Pol II binding under two conditions, control vs. treatment

  • Genome-wide identification of Pol II binding in breast cancer cell lines We tested our model on the Pol II binding quantity in MCF7 and OHT-MCF7 breast cancer cell lines (+/- E2 treatment) derived from the use of ChIP-seq technology

Read more

Summary

Introduction

Parallel sequencing is a high-throughput technology capable of sequencing hundreds of thousands of DNA fragments in a single experiment. An underlying assumption of current approaches is that DNA-binding proteins, such as transcription factors, contain sequence-specific, DNA binding domains that target a cluster of cis-acting DNA elements sharing certain sequence features. While such algorithms can identify DNA binding sites for highly specific transcription factors, current approaches are not appropriate for identifying binding sites for the general transcriptional machinery, such as RNA polymerase II (Pol II), which typically does not display high sequence specificity. Our strategy is based on the underlying assumption that the number of Pol II-target sequences follows a Poisson distribution and can be used to identify differentially transcribed genes under different experimental conditions. Our proposed methodology can be used for making statistical inferences in experiments for which replicates are not available

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.