Max-Sum Diversification, Monotone Submodular Functions, and Dynamic Updates

Allan Borodin,Aadhar Jain,Hyun Chul Lee,Yuli Ye

doi:10.1145/3086464

Abstract

Result diversification is an important aspect in web-based search, document summarization, facility location, portfolio management, and other applications. Given a set of ranked results for a set of objects (e.g., web documents, facilities, etc.) with a distance between any pair, the goal is to select a subset S satisfying the following three criteria: (a) the subset S satisfies some constraint (e.g., bounded cardinality), (b) the subset contains results of high “quality,” and (c) the subset contains results that are “diverse” relative to the distance measure. The goal of result diversification is to produce a diversified subset while maintaining high quality as much as possible. We study a broad class of problems where the distances are a metric, where the constraint is given by independence in a matroid, where quality is determined by a monotone submodular function and diversity is defined as the sum of distances between objects in S . Our problem is a generalization of the max-sum diversification problem studied in Gollapudi and Sharma [2009], which in turn is a generalization of the max-sum p-dispersion problem studied extensively in location theory. It is NP-hard even with the triangle inequality. We propose two simple and natural algorithms: a greedy algorithm for a cardinality constraint and a local search algorithm for an arbitrary matroid constraint. We prove that both algorithms achieve constant approximation ratios.

Full Text