Notice of Violation of IEEE Publication Principles: Audio visual automatic speech recognition using multi-tasking learning of deep neural networks

Hunny Pahuja,Priya Ranjan,Amit Ujlayan

doi:10.1109/ictus.2017.8286052

Abstract

Training of two or more tasks over shared representation involves in the Multi-task learning (MTL). In audio-visual automatic speech recognition, multi task learning is applied in the present work. Primary task of MTL is to learn mapping between frame labels obtained from acoustic GMM/HMM model and audio-visual fused features. An auxiliary task which maps frame labels obtained from a visual GMM/HMM model to visual features is combined with frame labels obtained from acoustic model and audio-visual fused features. Results of a base-line hybrid DNN-HMM AVASR model are compared with MTL model which is tested at various levels of babble noise. Results of this paper indicate that MTL is useful at higher level of noise. Comparison with the base-line model, at −3 SNR dB approximate 7%o relative improvement in WER is reported.

Full Text