Abstract

The common criteria for evaluating spectral clustering are NCut and RatioCut. The seemingly unrelated column subset selection (CSS) problem aims to compute a column subset that linearly approximates the entire matrix. A common criterion is the approximation error in the Frobenius norm (ApproxErr). We show that any algorithm for CSS can be viewed as a clustering algorithm that minimizes NCut by applying it to a matrix formed from graph edges. Conversely, any clustering algorithm can be seen as identifying a column subset from that matrix. In both cases, ApproxErr and NCut have the same value. Analogous results hold for RatioCut with a slightly different matrix. Therefore, established results for CSS can be mapped to spectral clustering. We use this to obtain new clustering algorithms, including an optimal one that is similar to A*. This is the first nontrivial clustering algorithm with such an optimality guarantee. A variant of the weighted A* runs much faster and provides bounds on the accuracy. Finally, we use the results from spectral clustering to prove the NP-hardness of CSS from sparse matrices.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call