Upper bounds for retrieval performance and their use measuring performance and generating optimal boolean queries: Can it get any better than this?

Robert M Losee

doi:10.1016/0306-4573(94)90064-7

Abstract

The best-case, random, and worst-case document rankings and retrieval performance may be determined using a method discussed here. Knowledge of the best case performance allows users and system designers to (a) determine how close to optimality their search is and (b) select queries and matching functions that will produce the best results. A method for deriving the optimal Boolean query for a given level of recall is suggested, as is a method for determining the quality of a Boolean query. Measures are proposed that modify conventional text retrieval measures such as precision, E, and average search length, so that the values for these measures are 1 when retrieval is optimal, 0 when retrieval is random, and −1 when worst-case. Tests using one of these measures show that many retrievals are optimal. Consequences for retrieval research are examined.

Full Text