Abstract

A top-k query retrieves the k tuples with highest scores according to a user preference, defined as a scoring function. It is difficult for a user to precisely specify the scoring function. Instead, obtaining the distribution on scoring functions, i.e., the preference distribution, has been extensively explored in many fields. Motivated by this, we introduce the uniform (r,k)-hit (UrkHit) problem. Given a preference distribution, UrkHit aims to select a representative set of r tuples to maximize the probability of containing a tuple attractive to the user. We say a tuple attracts a user, if it is a top-k tuple for the scoring function adopted by the user. Further, we generalize UrkHit and propose the (r,k)-hit (rkHit) problem with an additional penalty function to model the user satisfaction with the tuple ranked i-th. rkHit aims to maximize the expected user satisfaction with the representative set. In 2D space, we design an exact algorithm 2DH for rkHit, indicating rkHit is in P for d=2. We show that rkHit is NP-hard when d\ge3. In 3D space, assuming a uniform preference distribution, we propose a (1-1/e)-approximation algorithm 3DH based on space partitioning. In addition, we propose an approximate algorithm MDH suitable for any dimension and distribution, which creatively combines the ideas of sampling and clustering. It relaxes the approximation guarantee slightly. Comprehensive experiments demonstrate the efficiency and effectiveness of our algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call