Using Ramsey Theory to Measure Unavoidable Spurious Correlations in Big Data

Micheal Pawliuk,Michael Alexander Waddell

doi:10.3390/axioms8010029

Abstract

Given a dataset, we quantify the size of patterns that must always exist in the dataset. This is done formally through the lens of Ramsey theory of graphs, and a quantitative bound known as Goodman’s theorem. By combining statistical tools with Ramsey theory of graphs, we give a nuanced understanding of how far away a dataset is from correlated, and what qualifies as a meaningful pattern. This method is applicable to a wide range of datasets. As examples, we analyze two very different datasets. The first is a dataset of repeated voters ( n = 435 ) in the 1984 US congress, and we quantify how homogeneous a subset of congressional voters is. We also measure how transitive a subset of voters is. Statistical Ramsey theory is also used with global economic trading data ( n = 214 ) to provide evidence that global markets are quite transitive. While these datasets are small relative to Big Data, they illustrate the new applications we are proposing. We end with specific calls to strengthen the connections between Ramsey theory and statistical methods.

Highlights

In the realm of data science, the conventional wisdom is that “more data is always better”, but is this the case? As a dataset D becomes larger, Ramsey theory describes the mathematical conditions by which disorder becomes impossible
Axioms 2019, 8, 29 beyond the base requirement that there is a single shirt that must be worn twice in a given week. This leads to our major connection between Ramsey theory and statistical analysis: Remark 1 (Spurious Correlations through Ramsey theory)
While the expected value is a good benchmark, it still doesn’t answer the more fundamental question of how many monochromatic triangles are present in GN versus how many are required by Ramsey theory

Summary

Introduction

In the realm of data science, the conventional wisdom is that “more data is always better”, but is this the case? As a dataset D becomes larger, Ramsey theory describes the mathematical conditions by which disorder becomes impossible. It would be incorrect to conclude that the given person has a particular affinity for that repeated shirt In this case, there is no meaningful conclusion we can draw, despite the natural human desire to attribute meaning to a pattern that is observed but forced to exist by the pigeonhole principle. Axioms 2019, 8, 29 beyond the base requirement that there is a single shirt that must be worn twice in a given week This leads to our major connection between Ramsey theory and statistical analysis: Remark 1 (Spurious Correlations through Ramsey theory). Translating the Ramsey theorem Goodman’s theorem to a measurement of transitivity of a system (Theorem 2) In order for these connections to be further used and explored, we take care to explain the Ramsey theory we use in the language that an untrained data scientist will understand.

Mathematical Framework

The Ramsey Perspective

Models

Similarity in Voting Records

Theoretical Construction

Defining Deviation

Applied to Voting Threshold Graphs

Collaboration Model

Applications to Other Datasets

Applications to Transitivity

Application to Voting Records

Application to Global Trading Data

Theory Building

Further Applications

Findings

Closing Remarks

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Axioms	Publication Date: Mar 5, 2019
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Using Ramsey Theory to Measure Unavoidable Spurious Correlations in Big Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Axioms

Lead the way for us

Similar Papers

Two problems in graph Ramsey theory
Tuan Tran
European Journal of Combinatorics | VOL. 104
Tuan TranTuan Tran
06 May 2022
European Journal of Combinatorics | VOL. 104

Graph Ramsey Theory and the Polynomial Hierarchy
Marcus Schaefer
Journal of Computer and System Sciences | VOL. 62
Marcus SchaeferMarcus Schaefer
01 Mar 2001
Journal of Computer and System Sciences | VOL. 62

Graph Ramsey theory and the polynomial hierarchy
Marcus Schaefer
-
Marcus SchaeferMarcus Schaefer
01 May 1999
01 May 1999

On-line Ramsey Theory for Bounded Degree Graphs
Jane Butterfield ... Christopher Stocker
The Electronic Journal of Combinatorics | VOL. 18
Jane Butterfield, et. al.Jane Butterfield ... Christopher Stocker
01 Jul 2011
The Electronic Journal of Combinatorics | VOL. 18

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Using Ramsey Theory to Measure Unavoidable Spurious Correlations in Big Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Axioms