Abstract
In natural language generation, most decoding methods are not intrinsic because their performance depends on extrinsically configured hyperparameters. It means that: first, the generation system is dynamic under different conditions while the decoding system is always static under any conditions once its hyperparameters are extrinsically fixed; second, it is hard to select a constant decoding hyperparameter that is omnipotent for all conditions. Although there are decoding methods that are hyperparameter-free, such as greedy and plain sampling, it has been well studied that these methods generally perform worse than methods with hyperparameters, such as beam search, top-k and top-p. Decoding with hyperparameters can get infinite strategies from different fixed configurations, while hyperparameter-free methods have only one strategy. Therefore, the comparison between them is actually unfair, which is a one-vs-infinite battle. So how to deal with the decoding hyperparameters properly and intrinsically? Is it true that hyperparameter-free methods are always inferior to methods with inexhaustible hyperparameter configurations? Is it possible to design a generalized framework, by which these decoding methods can be naturally connected, uniformly described, and mutually inspired? In these paper, we try to find answers to these questions.To this end, we first propose a generalized decoding framework, i.e., GSD, that can be used to uniformly describe and connect existing popular decoding methods. As far as we know, this is the first work trying to build a theoretical framework to associate these decoding methods in formal mathematical theorems. Based on the framework, we then propose Intrinsic Decoding, a novel implementation of GSD with distinctive design from existing decoding algorithms: it is intrinsic and dynamic. Intrinsic Decoding changes the aforementioned comparison from one-vs-infinite to dynamic-vs-infinite. Just like greedy and sampling, Intrinsic Decoding has no hyperparameter, while effecting better than both greedy and sampling, even achieving comparable performance to the methods equipped with inexhaustible hyperparameter configurations, such as beam search, top-k and top-p.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.