Noise processing and multitask learning for far‐field dialect classification

Hai Wang,Kan Zhang,Chenguang Qin,Yan Wang,Yuhui Ma,Jie Ren,Ling Gao

doi:10.1002/cpe.7274

Abstract

SummaryDeep learning has made great achievements in the field of speech recognition. With the popularization of embedded devices such as intelligent speaker and the demand for dialect interaction scenes, it poses great challenges to far‐field speech recognition and dialect language recognition. In order to solve the dialect language recognition of embedded devices in far‐field speech recognition, we propose a deep learning neural network model with multitask learning. First, the audio is passed through the end‐to‐end noise reduction model to improve the effect of audio recognition. Then we define dialect recognition as the main task and dialect area as the auxiliary task, using the multitask learning method to improve the accuracy of dialect classification. The experimental results show that the end‐to‐end noise reduction model can improve the accuracy of audio recognition, and the best effect can be 7.54% higher than the baseline, and the accuracy of dialect language recognition can be improved by about 5% through multi task learning model.

Full Text