Lower resources of spoken language understanding from voice to semantics

Zhang Hao,Lv Cheng Guo

doi:10.1088/1742-6596/1486/5/052033

Abstract

Spoken language understanding is traditionally designed as a pipeline consisting of multiple components. First, the speech signal is mapped into text through the automatic speech recognition module, and then the natural language understanding module converts the recognized text into structured data, such as domain, intention and slot value. Usually these modules are trained separately. End-to-end speech comprehension, on the other hand, derives structured data directly from speech through a single model. However, end-to-end spoken language understanding based on a large amount of training data is difficult to achieve in different fields and different groups of people. For this reason, we introduced end-to-end oral comprehension based on pre-training with low resources and combined it with capsule vector. The experimental results show that the oral comprehension of this model with low resources is robust under different data sets.

Full Text