Abstract

The goal of top-k ranking is to rank individuals so that the best k of them can be determined. Depending on the application domain, an individual can be a person, a product, an event, or just a collection of data or information for which an ordering makes sense. The problem of top-k ranking has profound commercial and social implications. In the context of databases, top-k ranking has been studied in two distinct directions, depending on whether the stored information is certain or uncertain. In the former, since there is little dispute on what top-k ranking is, the past research has focused on efficient query processing. In the latter case, a number of semantics based on possible worlds have been proposed and computational mechanisms investigated for what are called uncertain databases or probabilistic databases, where a tuple is associated with a membership probability indicating the level of confidence on the stored information. In this thesis, we study top-k ranking with uncertain data in two general areas. The first is on pruning for the computation of top-k tuples in a probabilistic database. We investigate the theoretical basis and practical means of pruning for the recently proposed, unifying framework based on parameterized ranking functions. As such, our results are applicable to a wide range of ranking functions. We show experimentally that pruning can generate orders of magnitude performance gains. In the second area of our investigation, we study the problem of top-k ranking for objects with multiple attributes whose values are modeled by probability distributions and constraints. We formulate a theory of top-k ranking for objects by a characterization of what constitutes the strength of an object, and show that a number of previous proposals for top-k ranking are special cases of our theory. We carry out a limited study on computation of top-k objects under our theory. We reveal the close connection between top-k ranking in this context and high-dimensional space studied in mathematics, in particular, the problem of computing the volumes of high-dimensional polyhedra expressed by linear inequations is a special case of top-k ranking of objects, and as such, the algorithms formulated for the former can be employed for the latter under the same conditions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call