Abstract

Number of terms in a query is a query-specific constant that is typically ignored in retrieval functions. However, previous studies have shown that the performance of retrieval models varies for different query lengths, and it usually degrades when query length increases. A possible reason for this issue can be the extraneous terms in longer queries that makes it a challenge for the retrieval models to distinguish between the key and complementary concepts of the query. As a signal to understand the importance of a term, inverse document frequency (IDF) can be used to discriminate query terms. In this paper, we propose a constraint to model the interaction between query length and IDF. Our theoretical analysis shows that current state-of-the-art retrieval models, such as BM25, do not satisfy the proposed constraint. We further analyze the BM25 model and suggest a modification to adapt BM25 so that it adheres to the new constraint. Our experiments on three TREC collections demonstrate that the proposed modification outperforms the baselines, especially for verbose queries.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call