Abstract

We determine the information-theoretic cutoff value on separation of cluster centers for exact recovery of cluster labels in a K-component Gaussian mixture model with equal cluster sizes. Moreover, we show that a semidefinite programming (SDP) relaxation of the K-means clustering method achieves such sharp threshold for exact recovery without assuming the symmetry of cluster centers.

Highlights

  • Let X1, . . . , Xn be a sequence of independent random vectors in Rp sampled from a K-component Gaussian mixture model with K n

  • It should be noted that the algorithm in [52] critically depends on the symmetry of the Gaussian centers (i.e., μ and −μ) and it is structurally difficult to extend such algorithm with maintained statistical optimality to a general K-component Gaussian mixture model without assuming the centers are spaced

  • We provide an affirmative answer to this question: we show that there is an semidefinite programming (SDP) relaxation of the K-means clustering method (given in (11) below) achieving the exact recovery with high probability if ∆2 (1 + α)∆2, where

Read more

Summary

Introduction

Let X1, . . . , Xn be a sequence of independent random vectors in Rp sampled from a K-. It should be noted that the algorithm in [52] critically depends on the symmetry of the Gaussian centers (i.e., μ and −μ) and it is structurally difficult to extend such algorithm with maintained statistical optimality to a general K-component Gaussian mixture model without assuming the centers are spaced Another active line of research focuses on various convex relaxed versions of the K-means problem that is solvable in polynomial-time [57, 49, 42, 23, 59, 27, 12]. The exponential rate implies that exact recovery is achieved by the SDP relaxed K-means with high probability in the equal cluster size case n = n/K if minimal separation of cluster centers satisfies the lower bound.

Main result
Semidefinite programming relaxation: primal and dual
Discussions
Proof of key lemmas
Supporting lemmas
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call