Abstract

Testing homogeneity of k(≥2) multivariate distributions is a challenging problem in statistics, especially when the dimension of the data is much larger than the sample size. Most of the existing tests often perform poorly in this high dimension, low sample size (HDLSS) regime, and many of them cannot be used at all. In this article, we propose some nonparametric tests for this purpose. These tests have the distribution-free property in finite sample situations. They are based on a high dimensional clustering algorithm that makes a partition of the data to form a contingency table. Using the cell frequencies of that table, we construct the test statistics. We can develop tests based on a k-partition of the data or estimate the number of partitions from the data and construct tests based on it. Under appropriate regularity conditions, we prove the consistency of these tests in the HDLSS asymptotic regime. We also consider a multiscale approach, where the results for different number of partitions are aggregated judiciously. Extensive simulation study and analysis of some benchmark datasets illustrate the superiority of the proposed tests over some existing methods.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.