Abstract
Nonparametric and nonlinear measures of statistical dependence between pairs of random variables are important tools in modern data analysis. In particular the emergence of large data sets can now support the relaxation of linearity assumptions implicit in traditional association scores such as correlation. Here we describe a Bayesian nonparametric procedure that leads to a tractable, explicit and analytic quantification of the relative evidence for dependence vs independence. Our approach uses Polya tree priors on the space of probability measures which can then be embedded within a decision theoretic test for dependence. Polya tree priors can accommodate known uncertainty in the form of the underlying sampling distribution and provides an explicit posterior probability measure of both dependence and independence. Well known advantages of having an explicit probability measure include: easy comparison of evidence across different studies; encoding prior information; quantifying changes in dependence across different experimental conditions, and the integration of results within formal decision analysis.
Highlights
Quantifying the evidence for dependence or testing for departures from independence between random variables is an increasingly important task and has been the focus of a number of studies in the past decade
We propose a Bayesian nonparametric procedure to derive a probabilistic measure of dependency between two samples x and y without assuming a known form for the underlying distributions
Polya tree priors have previously been used to derive Bayesian nonparametric procedure for two sample hypothesis testing (Holmes et al 2015; Ma and Wong 2011) and extensions of these priors have been proposed to model distributions indexed by covariates (Trippa et al 2011)
Summary
Quantifying the evidence for dependence or testing for departures from independence between random variables is an increasingly important task and has been the focus of a number of studies in the past decade. In order to unravel the existing relationships between different molecular species (genes, proteins, ...) involved in a biological system, large datasets are commonly screened for evidence of association between the pairs of variables This requires adequate statistical procedures to quantify the evidence of dependence (or lack of independence) between two samples of typically continuous random variables. Polya tree priors have previously been used to derive Bayesian nonparametric procedure for two sample hypothesis testing (Holmes et al 2015; Ma and Wong 2011) and extensions of these priors have been proposed to model distributions indexed by covariates (Trippa et al 2011). In the Appendix we provide an empirical calibration comparing our method to that of other non-Bayesian approaches in the literature
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have