Abstract

Document corpora are key components of information retrieval test collections. However, for certain tasks, such as evaluating the effectiveness of a new retrieval technique or estimating the parameters of a learning to rank model, a corpus alone is not enough. For these tasks, queries and relevance judgments associated with the corpus are also necessary. However, researchers often find themselves in scenarios where they only have access to a corpus, in which case evaluation and learning to rank become challenging. Document corpora are relatively straightforward to gather. On the other hand, obtaining queries and relevance judgments for a given corpus is costly. In production environments, it may be possible to obtain low-cost relevance information using query and click logs. However, in more constrained research environments these options are not available, and relevance judgments are usually provided by humans. To reduce the cost of this potentially expensive process, researchers have developed low-cost evaluation strategies, including minimal test collections [2] and crowdsourcing [1]. Despite the usefulness of these strategies, the resulting relevance judgments cannot easily be “ported” to a new or different corpus. To overcome these issues, we propose a new method to reduce manual annotation costs by transferring relevance judgments across corpora. Assuming that a set of queries and relevance judgments have been manually constructed for a source document corpus Ds, our goal is to automatically construct a test collection for a target document corpus Dt by projecting the existing test collection from Ds onto Dt. The goal of projecting test collections is not to produce manual quality test collections. In fact, it is assumed that projected test collections will contain noisy relevance judgments (i.e., ones which humans are unlikely to agree with). The important question, however, is whether these noisy projected judgments are useful for training ranking models in the target corpus.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call