Abstract

This paper describes a method of automatically selecting types of responses, such as back-channel responses, changing the topic or expanding the topic, in conversational spoken dialog systems by using an LSTM-RNN-based encoder-decoder framework and multi-task learning. In our dialog system architecture, response utterances are generated after the response type is explicitly determined in order to generate more appropriate and cooperative response than the conventional end-to-end approach which generate response utterances directly. As a response type selector, an encoder and two decoders share states of hidden layers and are trained with the interpolated loss function of the two decoders. One of the decoders is for selecting types of responses and the other is for estimating the word sequence of the response utterances. In an evaluation experiment using a corpus of dialogs between elderly people and an interviewer, our proposed method achieved better performance than the standard method using single-task learning, especially when the amount of training data was limited.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call