Abstract
We determine the information-theoretic cutoff value on separation of cluster centers for exact recovery of cluster labels in a K-component Gaussian mixture model with equal cluster sizes. Moreover, we show that a semidefinite programming (SDP) relaxation of the K-means clustering method achieves such sharp threshold for exact recovery without assuming the symmetry of cluster centers.
Highlights
Let X1, . . . , Xn be a sequence of independent random vectors in Rp sampled from a K-component Gaussian mixture model with K n
It should be noted that the algorithm in [52] critically depends on the symmetry of the Gaussian centers (i.e., μ and −μ) and it is structurally difficult to extend such algorithm with maintained statistical optimality to a general K-component Gaussian mixture model without assuming the centers are spaced
We provide an affirmative answer to this question: we show that there is an semidefinite programming (SDP) relaxation of the K-means clustering method (given in (11) below) achieving the exact recovery with high probability if ∆2 (1 + α)∆2, where
Summary
Let X1, . . . , Xn be a sequence of independent random vectors in Rp sampled from a K-. It should be noted that the algorithm in [52] critically depends on the symmetry of the Gaussian centers (i.e., μ and −μ) and it is structurally difficult to extend such algorithm with maintained statistical optimality to a general K-component Gaussian mixture model without assuming the centers are spaced Another active line of research focuses on various convex relaxed versions of the K-means problem that is solvable in polynomial-time [57, 49, 42, 23, 59, 27, 12]. The exponential rate implies that exact recovery is achieved by the SDP relaxed K-means with high probability in the equal cluster size case n = n/K if minimal separation of cluster centers satisfies the lower bound.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have