Abstract
BackgroundIn eukaryotes, most DNA-binding proteins exert their action as members of large effector complexes. The presence of these complexes are revealed in high-throughput genome-wide assays by the co-occurrence of the binding sites of different complex components. Resampling tests are one route by which the statistical significance of apparent co-occurrence can be assessed.ResultsWe have investigated two resampling approaches for evaluating the statistical significance of binding-site co-occurrence. The permutation test approach was found to yield overly favourable p-values while the independent resampling approach had the opposite effect and is of little use in practical terms. We have developed a new, pragmatically-devised hybrid approach that, when applied to the experimental results of an Polycomb/Trithorax study, yielded p-values consistent with the findings of that study. We extended our investigations to the FL method developed by Haiminen et al, which derives its null distribution from all binding sites within a dataset, and show that the p-value computed for a pair of factors by this method can depend on which other factors are included in that dataset. Both our hybrid method and the FL method appeared to yield plausible estimates of the statistical significance of co-occurrences although our hybrid method was more conservative when applied to the Polycomb/Trithorax dataset.A high-performance parallelized implementation of the hybrid method is available.ConclusionsWe propose a new resampling-based co-occurrence significance test and demonstrate that it performs as well as or better than existing methods on a large experimentally-derived dataset. We believe it can be usefully applied to data from high-throughput genome-wide techniques such as ChIP-chip or DamID. The Cooccur package, which implements our approach, accompanies this paper.
Highlights
In eukaryotes, most DNA-binding proteins exert their action as members of large effector complexes
The classical statistical methods rely on obtaining the null distribution for a statistic that correlates with the phenomenon of interest
We were interested to determine whether their framework circumvents the drawbacks we identified with a permutation test approach assessed with the hypergeometric distribution
Summary
Most DNA-binding proteins exert their action as members of large effector complexes. The presence of these complexes are revealed in high-throughput genome-wide assays by the co-occurrence of the binding sites of different complex components. A large number of proteins are known to bind DNA in a location-specific manner. These include transcription factors, replication factors and chromatin components. When the binding sites of the individual proteins within a complex are determined by genome-wide high-throughput assays, these complexes are revealed as regions where the binding sites of multiple proteins are clustered. Many methods have been proposed for assessing the statistical significance of such clusters (reviewed in [1]). We will discuss how we have addressed these questions when considering the merit of a particular test and when devising an improved test
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.