Speaker-Informed time-and-Content-Aware attention for spoken language understanding

Jonggu Kim,Yewon Jeong,Jong-Hyeok Lee

doi:10.1016/j.csl.2019.101022

Abstract

To mitigate the ambiguity of spoken language understanding (SLU) of an utterance, we propose contextual models that can consider the relevant context by using temporal and content-related information effectively. We first propose two axes: ‘Awareness’ and ‘Attention Level’. Awareness includes three methods that consider the timing or content-similarity of context. The Attention Level includes three methods that consider speaker roles to calculate the importance of each historic utterance. By combining one method from each axis, we build various contextual models. The proposed models are designed to use a dataset to automatically learn the importance of previous utterances in terms of time and content. We also propose various speaker information that would be helpful to improve SLU accuracy. The proposed models achieved state-of-the-art F1 scores in experiments on the Dialog State Tracking Challenge (DSTC) 4 and Loqui benchmark datasets. We applied in-depth analysis to verify that the proposed methods are effective to improve SLU accuracy. The analysis also demonstrated the effectiveness of the proposed methods.

Full Text