Abstract

Pretrained contextualized language models such as BERT and T5 have established a new state-of-the-art for ad-hoc search. However, it is not yet well-understood why these methods are so effective, what makes some variants more effective than others, and what pitfalls they may have. We present a new comprehensive framework for Analyzing the Behavior of Neural IR ModeLs ABNIRML, which includes new types of diagnostic probes that allow us to probe several characteristics---such as sensitivity to word order---that are not addressed by previous techniques. To demonstrate the value of the framework, we conduct an extensive empirical study that yields insights into the factors that contribute to the neural model's gains, and identify potential unintended biases the models exhibit. We find evidence that recent neural ranking models are fundamentally different from prior ranking models: they rely less on exact term overlap with the query, and instead leverage richer linguistic information, evidenced by their much higher sensitivity to word and sentence order. We also find that the pretrained language model alone does not dictate a system's behavior in ad-hoc retrieval, and that the same model within different ranking architectures can result in very different behavior.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call