An improved algorithm for the maximal information coefficient and its application.

Dan Cao,Zheming Yuan,Yuan Chen,Hongyan Zhang,Jin Chen

doi:10.1098/rsos.201424

Abstract

The maximal information coefficient (MIC) captures both linear and nonlinear correlations between variable pairs. In this paper, we proposed the BackMIC algorithm for MIC estimation. The BackMIC algorithm adds a searching back process on the equipartitioned axis to obtain a better grid partition than the original implementation algorithm ApproxMaxMI. And similar to the ChiMIC algorithm, it terminates the grid search process by the χ2-test instead of the maximum number of bins B(n, α). Results on simulated data show that the BackMIC algorithm maintains the generality of MIC, and gives more reasonable grid partition and MIC values for independent and dependent variable pairs under comparable running times. Moreover, it is robust under different α in B(n, α). MIC calculated by the BackMIC algorithm reveals an improvement in statistical power and equitability. We applied (1-MIC) as the distance measurement in the K-means algorithm to perform a clustering of the cancer/normal samples. The results on four cancer datasets demonstrated that the MIC values calculated by the BackMIC algorithm can obtain better clustering results, indicating the correlations between samples measured by the BackMIC algorithm were more credible than those measured by other algorithms.

Highlights

Correlation analysis has important applications in data mining, such as disease diagnosis [1,2], public management [3,4] and financial market analysis [5,6]
Based on an equipartition of ny bins on one axis, the BackMIC algorithm locates an optimal partition of the x-axis through the dynamic programming algorithm to achieve the largest normalized mutual information under the restriction of the χ 2-test, which is similar to the ChiMIC algorithm [13]
The BackMIC algorithm added a searching back process to obtain an optimal partition for the equipartitioned axis, making it more likely to obtain the true maximal information coefficient (MIC) value

Summary

Introduction

Correlation analysis has important applications in data mining, such as disease diagnosis [1,2], public management [3,4] and financial market analysis [5,6]. B(n, α) is set, the MIC can only capture simple correlation patterns; by contrast, a high B(n, α) will cause a non-zero score even for independent variables [7] To solve this problem, Chen et al [13]. Proposed the ChiMIC algorithm (downloaded from https://github.com/chenyuan0510/Chi-MIC), in which one axis is equipartitioned, and the partition of other axis is terminated by the χ 2-test. We proposed an improved approximation algorithm called BackMIC for MIC estimation. This algorithm adds a searching back process on the equipartitioned axis to remove the restriction of equipartition and control the search process based on the χ 2-test for both the y- and x-axes. Results on simulated and real data demonstrated that the BackMIC algorithm exhibits better performance in measuring the correlations between independent and dependent variable pairs compared with the AppMIC and ChiMIC algorithms

Comparison of grids and estimated MICs for independent variable pairs

Comparison of grids and estimated MICs for dependent variable pairs

Comparison of robustness

Comparison of statistical power

Comparison of equitability

Comparison of computational cost

Simulated data

Real datasets

Methods

BackMIC algorithm

K-means clustering algorithm

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Royal Society Open Science	Publication Date: Feb 1, 2021
Citations: 20	License type: cc-by

R Discovery Prime

R Discovery Prime

An improved algorithm for the maximal information coefficient and its application.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Royal Society Open Science

Lead the way for us

Similar Papers

Unveiling linearly and nonlinearly correlated signals between gravitational wave detectors and environmental monitors
Hirotaka Yuzurihara ... Shuhei Mano
Physical Review D | VOL. 94
Hirotaka Yuzurihara, et. al.Hirotaka Yuzurihara ... Shuhei Mano
29 Aug 2016
Physical Review D | VOL. 94

Analysis of generic coupling between EEG activity and PETCO2 in free breathing and breath-hold tasks using Maximal Information Coefficient (MIC)
Maria Sole Morelli ... Nicola Vanello
Scientific Reports | VOL. 8
Maria Sole Morelli, et. al.Maria Sole Morelli ... Nicola Vanello
14 Mar 2018
Scientific Reports | VOL. 8

SuperMIC: Analyzing Large Biological Datasets in Bioinformatics with Maximal Information Coefficient.
Chao Wang ... Xuehai Zhou
IEEE/ACM transactions on computational biology and bioinformatics | VOL. 14
Chao Wang, et. al.Chao Wang ... Xuehai Zhou
05 Apr 2016
IEEE/ACM transactions on computational biology and bioinformatics | VOL. 14

Initial rapidity of tumor growth as a prognostic factor for the therapeutic effect of immune-checkpoint inhibitors in patients with non-small cell lung cancer: evaluation for linear and non-linear correlation.
Kosuke Sakai ... Joji Kuramoto
Journal of thoracic disease | VOL. 13
Kosuke Sakai, et. al.Kosuke Sakai ... Joji Kuramoto
01 Aug 2021
Journal of thoracic disease | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An improved algorithm for the maximal information coefficient and its application.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Royal Society Open Science