Abstract

Natural language understanding (NLU) is one of the most critical components in goal-oriented dialog systems and enables innovative Big Data applications such as intelligent voice assistants (IVA) and chatbots. While recent advances in deep learning-based NLU models have achieved significant improvements in terms of accuracy, most existing works are monolingual or bilingual. In this work, we propose and experiment with techniques to develop multilingual NLU models. In particular, we first propose a purely language-agnostic multilingual NLU framework using a multilingual BERT (mBERT) encoder, a joint decoder design for intent classification and s lot filling tasks, and a novel co-appearance regularization technique. Then three distinct language-aware multilingual NLU approaches are proposed including using language code as explicit input; using language-specific parameters during decoding; and using implicit language identification as an auxiliary task. We show results for a large-scale, commercial IVA system trained on a various set of intents with huge vocabulary sizes, as well as on a public multilingual NLU dataset. We performed experiments in explicit consideration of code-mixing and language dissimilarities which are practical concerns in large-scale real-world IVA systems. We have found that language-aware designs can improve NLU performance when language dissimilarity and code-mixing exist. The empirical results together with our proposed architectures provide important insights towards designing multilingual NLU systems.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.