An empirical assessment of quality metrics for diversified similarity searching

Camila R Lopes,Daniel De Oliveira,Marcos Bedo,Lúcio F D Santos,Daniel L Jasbick

doi:10.5753/jidm.2021.1917

Abstract

A diversified similarity search retrieves elements that are simultaneously similar to a query object and akin to the different collections within the explored data. While several methods in information retrieval, data clustering, and similarity searching have tackled the problem of adding diversity into result sets, the experimental comparison of their performances is still an open issue mainly because the quality metrics are “borrowed” from those different research areas, bringing their biases alongside. In this manuscript, we investigate a series of such metrics and experimentally discuss their trends and limitations. We conclude diversity is better addressed by a set of measures rather than a single quality index and introduce the concept of Diversity Features Model (DFM), which combines the viewpoints of biased metrics into a multidimensional representation. Experimental evaluations indicate (i) DFM enables comparing different result diversification algorithms by considering multiple criteria, and (ii) the most suitable searching methods for a particular dataset are spotted by combining DFM with ranking aggregation and parallel coordinates maps.

Full Text