Abstract

Slot filling must be trained using human-labeled data that are expensive and only a limited amount of labeled utterances are readily available for learning. Data generation methods can help increase the size of the dataset and make variations to the training dataset by means of emerging new instances. We propose a novel labeled utterance generation algorithm to augment training data. Our hypothesis is that words in an utterance can be separated into the two parts, namely, slot values that are instances of slot types and the accompanying contexts. Our model aims to generate utterances that are diverse combinations of slot values and contexts that can appear together. To create various utterances containing a given condition, our deep generative model uses a conditional variational auto-encoder architecture. We conduct experiments on various slot filling datasets, specifically airline travel information systems (ATIS), Snips, and MIT Corpus. A quantitative analysis shows that the application of data augmentation using the proposed model improves the F1 score for slot filling. We also demonstrate that our labeled utterance generation model yields more desirable utterances.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call