Abstract

Chromatin is a tightly packaged structure of DNA and protein within the nucleus of a cell. The arrangement of different protein complexes along the DNA modulates and is modulated by gene expression. Measuring the binding locations and occupancy levels of different transcription factors (TFs) and nucleosomes is therefore crucial to understanding gene regulation. Antibody-based methods for assaying chromatin occupancy are capable of identifying the binding sites of specific DNA binding factors, but only one factor at a time. In contrast, epigenomic accessibility data like MNase-seq, DNase-seq, and ATAC-seq provide insight into the chromatin landscape of all factors bound along the genome, but with little insight into the identities of those factors. Here, we present RoboCOP, a multivariate state space model that integrates chromatin accessibility data with nucleotide sequence to jointly compute genome-wide probabilistic scores of nucleosome and TF occupancy, for hundreds of different factors. We apply RoboCOP to MNase-seq and ATAC-seq data to elucidate the protein-binding landscape of nucleosomes and 150 TFs across the yeast genome, and show that our model makes better predictions than existing methods. We also compute a chromatin occupancy profile of the yeast genome under cadmium stress, revealing chromatin dynamics associated with transcriptional regulation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call