Abstract

Testing for independence plays a fundamental role in many statistical techniques. Among the nonparametric approaches, the distance-based methods (such as the distance correlation-based hypotheses testing for independence) have many advantages, compared with many other alternatives. A known limitation of the distance-based method is that its computational complexity can be high. In general, when the sample size is n, the order of computational complexity of a distance-based method, which typically requires computing of all pairwise distances, can be O(n2). Recent advances have discovered that in the univariate cases, a fast method with O(n log n) computational complexity and O(n) memory requirement exists. In this paper, we introduce a test of independence method based on random projection and distance correlation, which achieves nearly the same power as the state-of-the-art distance-based approach, works in the multivariate cases, and enjoys the O(nK log n) computational complexity and O( max{n, K}) memory requirement, where K is the number of random projections. Note that saving is achieved when K < n/ log n. We name our method a Randomly Projected Distance Covariance (RPDC). The statistical theoretical analysis takes advantage of some techniques on the random projection which are rooted in contemporary machine learning. Numerical experiments demonstrate the efficiency of the proposed method, relative to numerous competitors.

Highlights

  • Test of independence is a fundamental problem in statistics, with many existing work including the maximal information coefficient (MIC) [1], the copula based measures [2,3], the kernel based criterion [4] and the distance correlation [5,6], which motivated our current work

  • The above procedure is motivated by the observation that the asymptotic distribution of the test statistic nΩn can be approximated by a Gamma distribution, whose parameters can be estimated by Eq 3.2 and Eq 3.3

  • The break-even sample size decreases as the data dimension increases, which implies that our proposed method is more advantageous than the direct method when random variables are of high dimension

Read more

Summary

INTRODUCTION

Test of independence is a fundamental problem in statistics, with many existing work including the maximal information coefficient (MIC) [1], the copula based measures [2,3], the kernel based criterion [4] and the distance correlation [5,6], which motivated our current work. When the random variables are univariate, there exist efficient numerical algorithms of time complexity. For the multivariate random variables, we have not found any efficient algorithms in existing papers after an extensive literature survey. This work will meet the demands for numerically efficient independence tests of multivariate random variables. The newly proposed test of independence between two (potentially multivariate) random variables X and Y works as follows. Both X and Y are randomly projected to onedimensional spaces. We will show (in Theorem 3.1) that the newly proposed algorithm enjoys the O(Kn log n) computational complexity and O( max{n, K}) memory requirement, where K is the number of random projections and n is the sample size.

REVIEW OF DISTANCE COVARIANCE
Definition of Distance Covariances
Fast Algorithm in the Univariate Cases
Distance Based Independence Tests
NUMERICALLY EFFICIENT METHOD FOR RANDOM VECTORS
Random Projection Based Methods for Approximating Distance Covariance
Test of Independence
THEORETICAL PROPERTIES
Using Random Projections in Distance-Based Methods
Asymptotic Properties of the Sample Distance Covariance Ωn
Properties of Eigenvalues λi’s
Asymptotic Properties of Averaged Projected Sample Distance Covariance Ωn
SIMULATIONS
Comparison With Direct Method
Comparison With Other Independence Tests
A Discussion on the Computational Efficiency
Connections With Existing Literature
CONCLUSION
DATA AVAILABILITY STATEMENT
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.