Abstract
In this paper, we test whether two data sets measured on the same set of subjects share a common clustering structure. As a leading example, we focus on comparing clustering structures in two independent random samples from two deterministic two-component mixtures of multivariate Gaussian distributions. Mean parameters of these Gaussian distributions are treated as potentially unknown nuisance parameters and are allowed to differ. Assuming knowledge of mean parameters, we first determine the phase diagram of the testing problem over the entire range of signal-to-noise ratios by providing both lower bounds and tests that achieve them. When nuisance parameters are unknown, we propose tests that achieve the detection boundary adaptively as long as ambient dimensions of the data sets grow at a sublinear rate with the sample size.
Submitted Version (
Free)
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have