Abstract
The natural language generation (NLG) component of a spoken dialogue system (SDS) usually needs a substantial amount of handcrafting or a well-labeled dataset to be trained on. These limitations add significantly to development costs and make cross-domain, multi-lingual dialogue systems intractable. Moreover, human languages are context-aware. The most natural response should be directly learned from data rather than depending on predefined syntaxes or rules. This paper presents a statistical language generator based on a joint recurrent and convolutional neural network structure which can be trained on dialogue act-utterance pairs without any semantic alignments or predefined grammar trees. Objective metrics suggest that this new model outperforms previous methods under the same experimental conditions. Results of an evaluation by human judges indicate that it produces not only high quality but linguistically varied utterances which are preferred compared to n-gram and rule-based systems.
Highlights
Conventional spoken dialogue systems (SDS) are expensive to build because many of the processing components require a substantial amount of handcrafting (Ward and Issar, 1994; Bohus and Rudnicky, 2009)
In order to account for arbitrary slot-value pairs that cannot be routinely delexicalized in our corpus, Section 3.1 describes a convolutional neural network (CNN) (Collobert and Weston, 2008; Kalchbrenner et al, 2014) sentence model which is used to validate the semantic consistency of candidate utterances during reranking
In this paper a neural network-based natural language generator has been presented in which a forward RNN generator, a CNN reranker, and backward RNN reranker are jointly optimised to generate utterances conditioned by the required dialogue act
Summary
Conventional spoken dialogue systems (SDS) are expensive to build because many of the processing components require a substantial amount of handcrafting (Ward and Issar, 1994; Bohus and Rudnicky, 2009). In order to account for arbitrary slot-value pairs that cannot be routinely delexicalized in our corpus, Section 3.1 describes a convolutional neural network (CNN) (Collobert and Weston, 2008; Kalchbrenner et al, 2014) sentence model which is used to validate the semantic consistency of candidate utterances during reranking.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have