Abstract

Nonparametric and nonlinear measures of statistical dependence between pairs of random variables are important tools in modern data analysis. In particular the emergence of large data sets can now support the relaxation of linearity assumptions implicit in traditional association scores such as correlation. Here we describe a Bayesian nonparametric procedure that leads to a tractable, explicit and analytic quantification of the relative evidence for dependence vs independence. Our approach uses Polya tree priors on the space of probability measures which can then be embedded within a decision theoretic test for dependence. Polya tree priors can accommodate known uncertainty in the form of the underlying sampling distribution and provides an explicit posterior probability measure of both dependence and independence. Well known advantages of having an explicit probability measure include: easy comparison of evidence across different studies; encoding prior information; quantifying changes in dependence across different experimental conditions, and the integration of results within formal decision analysis.

Highlights

  • Quantifying the evidence for dependence or testing for departures from independence between random variables is an increasingly important task and has been the focus of a number of studies in the past decade

  • We propose a Bayesian nonparametric procedure to derive a probabilistic measure of dependency between two samples x and y without assuming a known form for the underlying distributions

  • Polya tree priors have previously been used to derive Bayesian nonparametric procedure for two sample hypothesis testing (Holmes et al 2015; Ma and Wong 2011) and extensions of these priors have been proposed to model distributions indexed by covariates (Trippa et al 2011)

Read more

Summary

Introduction

Quantifying the evidence for dependence or testing for departures from independence between random variables is an increasingly important task and has been the focus of a number of studies in the past decade. In order to unravel the existing relationships between different molecular species (genes, proteins, ...) involved in a biological system, large datasets are commonly screened for evidence of association between the pairs of variables This requires adequate statistical procedures to quantify the evidence of dependence (or lack of independence) between two samples of typically continuous random variables. Polya tree priors have previously been used to derive Bayesian nonparametric procedure for two sample hypothesis testing (Holmes et al 2015; Ma and Wong 2011) and extensions of these priors have been proposed to model distributions indexed by covariates (Trippa et al 2011). In the Appendix we provide an empirical calibration comparing our method to that of other non-Bayesian approaches in the literature

Polya Tree priors
The approach
Sensitivity to choice of A
Choice of the partition
Applications
Illustration for simple datasets
Applications from molecular biology
Details on derivation of the Bayes Factor
Other approaches
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call