End-To-End Multi-Modal Speech Recognition with Air and Bone Conducted Speech

Junqi Chen,Xiao-Lei Zhang,Zhiyong Huang,Susanto Rahardja,Mou Wang

doi:10.1109/icassp43922.2022.9747306

Abstract

Improving the performance of automatic speech recognition (ASR) in adverse acoustic environments is a long-term tough task. Although many robust ASR systems based on conventional microphones have been developed, their performance with air-conducted (AC) speech is still far from satisfactory in low signal-to-noise-ratio (SNR) environments. Bone-conducted (BC) speech is relatively insensitive to ambient noise, and has a potential of promoting the ASR performance at such low SNR environments as an auxiliary source. In this paper, we propose a conformer-based multi-modal speech recognition system. It uses a conformer encoder and a transformer-based truncated decoder to extract the semantic information from AC and BC channels respectively. The semantic information of the two channels are re-weighted and integrated by a novel multi-modal transducer. Experimental results show the effectiveness of the proposed method. For example, given a 0 dB SNR environment, it yields a character error rate of over 59.0% lower than a noise-robust baseline conducted on AC channel only, and over 12.7% lower than a multi-modal baseline that takes the concatenated features of AC and BC speech as the input.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

End-To-End Multi-Modal Speech Recognition with Air and Bone Conducted Speech

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Presentation method as air- and bone-conducted speech for delayed auditory feedback
Teruki Toya ... Masashi Unoki
The Journal of the Acoustical Society of America | VOL. 141
Teruki Toya, et. al.Teruki Toya ... Masashi Unoki
01 May 2017
The Journal of the Acoustical Society of America | VOL. 141

End-to-End Multi-Modal Speech Recognition on an Air and Bone Conducted Speech Corpus
Mou Wang ... Junqi Chen
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 31
Mou Wang, et. al.Mou Wang ... Junqi Chen
01 Jan 2023
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 31

Speaker-Independent Spectral Enhancement for Bone-Conducted Speech
Liangliang Cheng ... Liang Tao
Algorithms | VOL. 16
Liangliang Cheng, et. al.Liangliang Cheng ... Liang Tao
09 Mar 2023
Algorithms | VOL. 16

Bone-conducted Speech Enhancement Using Vector-quantized Variational Autoencoder and Gammachirp Filterbank Cepstral Coefficients
Quoc-Huy Nguyen ... Masashi Unoki
-
Quoc-Huy Nguyen, et. al.Quoc-Huy Nguyen ... Masashi Unoki
29 Aug 2022
29 Aug 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

End-To-End Multi-Modal Speech Recognition with Air and Bone Conducted Speech

Abstract

Talk to us

Similar Papers