Abstract

Dependence measures and tests for independence have recently attracted a lot of attention, because they are the cornerstone of algorithms for network inference in probabilistic graphical models. Pearson's product moment correlation coefficient is still by far the most widely used statistic yet it is largely constrained to detecting linear relationships. In this work we provide an exact formula for the th nearest neighbor distance distribution of rank-transformed data. Based on that, we propose two novel tests for independence. An implementation of these tests, together with a general benchmark framework for independence testing, are freely available as a CRAN software package (http://cran.r-project.org/web/packages/knnIndep). In this paper we have benchmarked Pearson's correlation, Hoeffding's , dcor, Kraskov's estimator for mutual information, maximal information criterion and our two tests. We conclude that no particular method is generally superior to all other methods. However, dcor and Hoeffding's are the most powerful tests for many different types of dependence.

Highlights

  • Dependence measures and tests for independence have recently attracted a lot of attention, because they are the cornerstone of algorithms for network inference in probabilistic graphical models

  • In this work we provide an exact formula for the ith nearest neighbor distance distribution of rank-transformed data (i~1,2,:::)

  • We have derived an exact formula for the distribution of the distances of the i th nearest neighbour of a given point

Read more

Summary

Introduction

Dependence measures and tests for independence have recently attracted a lot of attention, because they are the cornerstone of algorithms for network inference in probabilistic graphical models. Dcor and Kraskov’s estimator use the pair-wise distances of the points in a sample as a sufficient statistic. We derive the distribution (conditional distribution) of the (iz1)th nearest neighbor of a point (given the distance to its previous neighbors). We derive a general formula for all admissible configurations in the case of c~a, P(Di0zrz1wc, Di0zr~ Á Á Á ~Di0z1~c, Di0 vc).

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call