Learning from Multiple Noisy Augmented Data Sets for Better Cross-Lingual Spoken Language Understanding

Yingmei Guo,Ming Gong,Mingxing Xu,Linjun Shou,Zhiyong Wu,Daxin Jiang,Jian Pei

doi:10.18653/v1/2021.emnlp-main.259

Learning from Multiple Noisy Augmented Data Sets for Better Cross-Lingual Spoken Language Understanding

Yingmei Guo, Ming Gong + Show 5 more

Open Access

https://doi.org/10.18653/v1/2021.emnlp-main.259

Copy DOI

Publication Date: Jan 1, 2021
Citations: 2	License type: cc-by

#Spoken Language Understanding #Data Augmentation Approaches + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

Lack of training data presents a grand challenge to scaling out spoken language understanding (SLU) to low-resource languages. Although various data augmentation approaches have been proposed to synthesize training data in low-resource target languages, the augmented data sets are often noisy, and thus impede the performance of SLU models. In this paper we focus on mitigating noise in augmented data. We develop a denoising training approach. Multiple models are trained with data produced by various augmented methods. Those models provide supervision signals to each other. The experimental results show that our method outperforms the existing state of the art by 3.05 and 4.24 percentage points on two benchmark datasets, respectively. The code will be made open sourced on github.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.