A new multi-stream approach using acoustic and visual features for robust speech recognition system

N Radha,A Shahina,A Nayeemulla Khan,Jansi Rani Sella Velusami

doi:10.1016/j.matpr.2022.03.537

Abstract

Building a robust Automatic Speech Recognition (ASR) system and improving recognition accuracy in adverse conditions is still a challenging task. One way to improve the robustness of an ASR system is combining information from multiple sources (streams). A multi-stream approach which handles the multiple inputs at the model level is the key contribution of our work. Standard mic (Sm), Throat mic (Tm), and Lip reading (Lr) are the various source streams that have been used. This work explores a static weighted two stream HMM (TSH) and multi-stream HMM (MSH) model for the bimodal and multimodal systems. A syllabic units of the Hindi language database categorized into three categories – Vowel, Place of Articulation (POA), and Manner of Articulation (MOA) are used for training and testing. In this study, four types of TSH are proposed for the combination of bimodal ((Sm+Tm), (Tm+Lr), (Sm+Lr), (Lm+Lm)) systems and one type MSH is proposed for multimodal (Sm+Tm+Lr) system in a synchronous and asynchronous manner. Mel Frequency Cepstral Coefficient (MFCC) features are used for Sm and Tm signals. Combined pixel-motion based features (DCT/DWT-MHI) are used for Lr signals. Among these two features, DWT outperforms than DCT and used as a feature for visual speech. Experiments were conducted for bimodal and multimodal system. The proposed MSH approach shows improvements of 1.36%, 6.21%, and 5.8% in recognition accuracy for Vowel, POA and MOA category respectively, as compared to bimodal systems.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A new multi-stream approach using acoustic and visual features for robust speech recognition system

Abstract

Talk to us

Similar Papers

More From: Materials Today: Proceedings

Lead the way for us

Journal: Materials Today: Proceedings	Publication Date: Jan 1, 2022
Citations: 1

Similar Papers

A Study on the Impact of Lombard Effect on Recognition of Hindi Syllabic Units Using CNN Based Multimodal ASR Systems
...
Archives of Acoustics | VOL. 45
, et. al. ...
26 Jul 2023
Archives of Acoustics | VOL. 45

A multimodal Lombard speech recognition system for the confusable Hindi syllabic units
S Uma Maheswari ... A Nayeemulla Khan
Materials Today: Proceedings | VOL. 62
S Uma Maheswari, et. al.S Uma Maheswari ... A Nayeemulla Khan
01 Jan 2021
Materials Today: Proceedings | VOL. 62

An analysis of the effect of combining standard and alternate sensor signals on recognition of syllabic units for multimodal speech recognition
Radha N ... Nayeemulla Khan A
Pattern Recognition Letters | VOL. 115
Radha N, et. al.Radha N ... Nayeemulla Khan A
12 Oct 2017
Pattern Recognition Letters | VOL. 115

Combined speech enhancement and auditory modelling for robust distributed speech recognition
Ronan Flynn ... Edward Jones
Speech Communication | VOL. 50
Ronan Flynn, et. al.Ronan Flynn ... Edward Jones
20 May 2008
Speech Communication | VOL. 50

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A new multi-stream approach using acoustic and visual features for robust speech recognition system

Abstract

Talk to us

Similar Papers

More From: Materials Today: Proceedings