Abstract

Discovering a correlation from one variable to another variable is of fundamental scientific and practical interest. While existing correlation measures are suitable for discovering average correlation, they fail to discover hidden or potential correlations. To bridge this gap, (i) we postulate a set of natural axioms that we expect a measure of potential correlation to satisfy; (ii) we show that the rate of information bottleneck, i.e., the hypercontractivity coefficient, satisfies all the proposed axioms; (iii) we provide a novel estimator to estimate the hypercontractivity coefficient from samples; and (iv) we provide numerical experiments demonstrating that this proposed estimator discovers potential correlations among various indicators of WHO datasets, is robust in discovering gene interactions from gene expression time series data, and is statistically more powerful than the estimators for other correlation measures in binary hypothesis testing of canonical examples of potential correlations.

Highlights

  • Measuring the strength of an association between two random variables is a fundamental topic of broad scientific interest

  • We provide a novel interpretation to the hypercontractivity coefficient as a measure of potential correlation by demonstrating that it satisfies a natural set of axioms such a measure is expected to obey

  • We show applications of our estimator of hypercontractivity coefficient in two important datasets: In Section 4.2, we demonstrate that it discovers hidden potential correlations among various national indicators in World Health Organization (WHO) datasets, including how aid is potentially correlated with the income growth

Read more

Summary

Introduction

Measuring the strength of an association between two random variables is a fundamental topic of broad scientific interest. This intuition is made precise, where we formally define a natural notion of potential correlation (Axiom 6), and show that the rate of information bottleneck s( X; Y ) captures this potential correlation (Theorem 1) while other standard measures of correlation fail (Theorem 2) This ratio has only recently been identified as the hypercontractivity coefficient [11]. We prove that existing standard measures of correlation fail to satisfy the proposed axioms, and fail to capture canonical examples of potential p correlations captured by s( X; Y ) (Section 2.3) Another natural candidate is mutual information, but it is not clear how to interpret the value of mutual information as it is unnormalized, unlike all other measures of correlation which are between zero and one. We show empirically that the estimator of the hypercontractivity coefficient recovers this order accurately from a vastly smaller number of samples compared to other state-of-the art causal influence estimators

Axiomatic Approach to Measure Potential Correlations
Axioms for Potential Correlation
The Hypercontractivity Coefficient Satisfies All Axioms
Standard Correlation Coefficients Violate the Axioms
Mutual Information Violates the Axioms
Hypercontractivity Ribbon
Multidimensional X and Y
Estimator of the Hypercontractivity Coefficient from Samples
Experimental Results
Synthetic Data
Real Data
How Hypercontractivity Changes as We Remove Outliers
Hypercontractivity Detecting an Outlier
Gene Pathway Recovery From Single Cell Data
Proof of Proposition 1 p p
Proof of Theorem 1
Proof of Theorem 2
Proof of Proposition 2
Noisy Discrete Rare Correlation in Example 3
Proof of Proposition 4 p
Proof of Theorem 3
Proof of Lemma 2
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call