Optimal Clustering in Stable Instances Using Combinations of Exact and Noisy Ordinal Queries

Enrico Bianchi,Paolo Penna

doi:10.3390/a14020055

Abstract

This work studies clustering algorithms which operates with ordinal or comparison-based queries (operations), a situation that arises in many active-learning applications where “dissimilarities” between data points are evaluated by humans. Typically, exact answers are costly (or difficult to obtain in large amounts) while possibly erroneous answers have low cost. Motivated by these considerations, we study algorithms with non-trivial trade-offs between the number of exact (high-cost) operations and noisy (low-cost) operations with provable performance guarantees. Specifically, we study a class of polynomial-time graph-based clustering algorithms (termed Single-Linkage) which are widely used in practice and that guarantee exact solutions for stable instances in several clustering problems (these problems are NP-hard in the worst case). We provide several variants of these algorithms using ordinal operations and, in particular, non-trivial trade-offs between the number of high-cost and low-cost operations that are used. Our algorithms still guarantee exact solutions for stable instances of k-medoids clustering, and they use a rather small number of high-cost operations, without increasing the low-cost operations too much.

Full Text