Abstract

Quite surprisingly, exact maximum a posteriori (MAP) decoding of neural language generators frequently leads to low-quality results. Rather, most state-of-the-art results on language generation tasks are attained using beam search despite its overwhelmingly high search error rate. This implies that the MAP objective alone does not express the properties we desire in text, which merits the question: if beam search is the answer, what was the question? We frame beam search as the exact solution to a different decoding objective in order to gain insights into why high probability under a model alone may not indicate adequacy. We find that beam search enforces uniform information density in text, a property motivated by cognitive science. We suggest a set of decoding objectives that explicitly enforce this property and find that exact decoding with these objectives alleviates the problems encountered when decoding poorly calibrated language generation models. Additionally, we analyze the text produced using various decoding strategies and see that, in our neural machine translation experiments, the extent to which this property is adhered to strongly correlates with BLEU.

Highlights

  • As a simple search heuristic, beam search has been used to decode models developed by the NLP community for decades

  • In the context of neural machine translation (NMT), a shocking empirical finding has emerged: Using beam search to decode sentences from neural text generators almost invariably leads to better text than using exact search

  • We propose a uniform information density hypothesis (UID)-inspired regularizer of our own design that exploits the nature of maximum a posteriori (MAP) decoding, for which the overarching goal is to find a solution with low surprisal

Read more

Summary

Introduction

As a simple search heuristic, beam search has been used to decode models developed by the NLP community for decades. It is noteworthy that beam search is one of the few NLP algorithms that has stood the test of time: It has remained a cornerstone of NLP systems since the 1970s (Reddy, 1977). As such, it became the natural choice for decoding neural probabilistic text generators—whose design makes evaluating the full search space impossible (Kalchbrenner and Blunsom, 2013; Sutskever et al, 2014; Vinyals and Le, 2015; Yin et al, 2016). Stahlberg and Byrne (2019) report that exact search

Objectives
Methods
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call