Efficient Corpus Creation Method for NLU Using Interview with Probing Questions

Kazuaki Shima,Masataka Motohashi,Rintaro Ikeshita,Takeshi Homma,Hiroaki Kokubo,Jinhua She,Yasunari Obuchi

doi:10.20965/jaciii.2019.p0947

Kazuaki Shima, Masataka Motohashi + Show 5 more

Open Access

https://doi.org/10.20965/jaciii.2019.p0947

Copy DOI

Abstract

This paper presents an efficient method to build a corpus to train natural language understanding (NLU) modules. Conventional corpus creation methods involve a common cycle: a subject is given a specific situation where the subject operates a device by voice, and then the subject speaks one utterance to execute the task. In these methods, many subjects are required in order to build a large-scale corpus, which causes a problem of increasing lead time and financial cost. To solve this problem, we propose to incorporate a “probing question” into the cycle. Specifically, after a subject speaks one utterance, the subject is asked to think of alternative utterances to execute the same task. In this way, we obtain many utterances from a small number of subjects. An evaluation of the proposed method applied to interview-based corpus creation shows that the proposed method reduces the number of subjects by 41% while maintaining morphological diversity in a corpus and morphological coverage for user utterances spoken to commercial devices. It also shows that the proposed method reduces the total time for interviewing subjects by 36% compared with the conventional method. We conclude that the proposed method can be used to build a useful corpus while reducing lead time and financial cost.

Full Text