Fewer Triplets Than You Think: Novelty Error Converges Faster Than Triplet Violations in Ordinal Embeddings

Mark Fuge,Matthew Keeler

doi:10.1115/detc2023-116696

Abstract

Abstract A practical and well-studied method for computing the novelty of a design is to construct an ordinal embedding via a collection of pairwise comparisons between items (called triplets), and use distances within that embedding to compute which designs are farthest from the center. Unfortunately, ordinal embedding methods can require a large number of triplets before their primary error measure — the triplet violation error — converges. But if our goal is accurate novelty estimation, is it really necessary to fully minimize all triplet violations? Can we extract useful information regarding the novelty of all or some items using fewer triplets than classical convergence rates might imply? This paper addresses this question by studying the relationship between triplet violation error and novelty score error when using ordinal embeddings. Specifically, we compare how errors in embeddings produced by Generalized Non-Metric Dimensional Scaling (GNMDS) converge under different sampling methods, for different numbers of embedded items, sizes of latent spaces, and for the top K most novel designs. We find that estimating the novelty of a set of items via ordinal embedding can require significantly fewer human-provided triplets than is needed to converge the triplet error, and that this effect is modulated by the type of triplet sampling method (random versus uncertainty sampling). We also find that uncertainty sampling causes unique converge behavior in estimating most novel items compared to non-novel items. Our results imply that in certain situations one can use ordinal embedding techniques to estimate novelty error in fewer samples than is typically expected. Moreover, the convergence behavior of top K novel items motivates new potential triplet sampling methods that go beyond typical triplet reduction measures.

Full Text