Relevance as a subjective and situational multidimensional concept

Carsten Eickhoff

doi:10.1145/2348283.2348422

Abstract

Relevance is the central concept of information retrieval. Although its important role is unanimously accepted among researchers, numerous different definitions of the term have emerged over the years. Considerable effort has been put into creating consistent and universally applicable descriptions of relevance in the form of relevance frameworks. Across these various formal systems of relevance, a wide range of relevance criteria has been identified. The probably most frequently used single criterion, that in some applications even becomes a synonym for relevance, is topicality. It expresses a document's topical overlap with the user's information need. For textual resources, it is often estimated based on term co-occurrences between query and document. There is, however, a significant number of further noteworthy relevance criteria. Prominent specimen are: (Currency) determines how recent and up to date the document is. Outdated information may have become invalid over time.(Availability) expresses how easy it is to obtain the document. Users might not want to invest more than a threshold amount of resources (e.g., disk space, downloading time or money) to get the document.(Readability) describes the document's readability and understandability. A document with a high topical relevance towards a given information need can become irrelevant if the user is not able to extract the desired information from it.(Credibility) contains criteria such as the document author's expertise, the publication's reputation and the document's general trustworthiness.(Novelty) describes the document's contribution to satisfying an information need with respect to the user's context. E.g., previous search results or general knowledge about the domain.It is evident that these criteria can have very different scopes. Some of them are static characteristics of the document or the author, others depend on the concrete information need at hand or even the user's search context. Currently, state-of-the-art retrieval models often treat relevance (regardless which interpretation of the term was chosen) as an atomic concept that can be expressed through topical overlap between document and query or a plain linear combination of multiple scores. Considering the broad audiences a web search engine has to serve, such a method does not seem optimal as the concrete composition of relevance will vary from person to person depending on social and educational context. Furthermore, each individual can be expected to have situational preference for certain combinations of relevance facets depending on the information need at hand. We investigate combination schemes which respect the dimension-specific relevance distributions. In particular, we developed a risk-aware method of combining relevance criteria inspired by the economic Portfolio theory. As a first stage, we applied this method for result set diversification across dimensions.

Full Text