Abstract

Detecting the association between two variables is necessary and meaningful in the era of big data. There are many measures to detect the association between them, some detect linear association, e.g., simple and fast Pearson correlation coefficient, and others detect nonlinear association, e.g., computationally expensive and imprecise maximal information coefficient (MIC). In our study, we proposed a novel maximal association coefficient (MAC) based on the idea that any nonlinear association can be considered to be composed of some piecewise-linear ones, which detects linear or nonlinear association between two variables through Pearson coefficient. We conduct experiments on some simulation data, with the results show that the MAC has both generality and equitability. In addition, we also apply MAC method to two real datasets, the major-league baseball dataset from Baseball Prospectus and dataset of credit card clients' default, to detect the association strength of pairs of variables in these two datasets respectively. The experimental results show that the MAC can be used to detect the association between two variables, and it is computationally inexpensive and precise than MIC, which may be potentially important for follow-up data analysis and the conclusion of data analysis in the future.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call