Abstract

This work focuses on active learning from relative comparison information. A relative comparison specifies, for a data triplet $(x_i,x_j,x_k)$ , that instance $x_i$ is more similar to $x_j$ than to $x_k$ . Such constraints, when available, have been shown to be useful toward learning tasks such as defining appropriate distance metrics or finding good clustering solutions. In real-world applications, acquiring constraints often involves considerable human effort, as it requires the user to manually inspect the instances. This motivates us to study how to select and query the most useful relative comparisons to achieve effective learning with minimum user effort. Given an underlying class concept that is employed by the user to provide such constraints, we present an information-theoretic criterion that selects the triplet whose answer leads to the highest expected information gain about the classes of a set of examples. Directly applying the proposed criterion requires examining $O(n^3)$ triplets with $n$ instances, which is prohibitive even for datasets of moderate size. We show that a randomized selection strategy can be used to reduce the selection pool from $O(n^3)$ to $O(n)$ with minimal loss in efficiency, allowing us to scale up to considerably larger problems. Experiments show that the proposed method consistently outperforms baseline policies.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call