An analysis of the effect of combining standard and alternate sensor signals on recognition of syllabic units for multimodal speech recognition

Radha N,Shahina A,Prabha P,Preethi Sri B.T,Nayeemulla Khan A

doi:10.1016/j.patrec.2017.10.011

Abstract

This paper studies the effect of combining evidences from multiple modes of speech on the recognition of different categories of sounds. Multimodal speech recognition systems are built by combining the acoustic and visual cues from the (lip radiated) normal microphone speech, throat microphone speech and lip reading for the recognition of the highly confusable 145 consonant-vowel units of the Hindi language. The performance of the multimodal systems are compared with that of the unimodal systems for the recognition of sounds based on their place (POA) and manner of articulation (MOA) as well as their associated vowels. This comparison shows that though the multimodal ASR systems rely on the presence of complimentary speech-related acoustic and visual cues present in the different modes, not all evidences are complimentary. Bimodal systems that combines visual cues from lip reading are shown to improve the recognition of sounds based on POA and MOA, but decrease the recognition of vowels. This study shows that, compared to the standard Automatic Speech Recognition(ASR) system, the best multimodal system that combines the two acoustic cues as well as visual cue improves the recognition of POA category by 11%, MOA category by 3% and vowels by 2%. However, the study shows the need for exploring better fusion techniques to overcome absence of complementary evidences in certain categories of sounds, especially in bimodal systems.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An analysis of the effect of combining standard and alternate sensor signals on recognition of syllabic units for multimodal speech recognition

Abstract

Talk to us

Similar Papers

More From: Pattern Recognition Letters

Lead the way for us

Journal: Pattern Recognition Letters	Publication Date: Oct 12, 2017
Citations: 8

Similar Papers

A multimodal Lombard speech recognition system for the confusable Hindi syllabic units
S Uma Maheswari ... A Nayeemulla Khan
Materials Today: Proceedings | VOL. 62
S Uma Maheswari, et. al.S Uma Maheswari ... A Nayeemulla Khan
01 Jan 2021
Materials Today: Proceedings | VOL. 62

A Study on the Impact of Lombard Effect on Recognition of Hindi Syllabic Units Using CNN Based Multimodal ASR Systems
...
Archives of Acoustics | VOL. 45
, et. al. ...
26 Jul 2023
Archives of Acoustics | VOL. 45

A new multi-stream approach using acoustic and visual features for robust speech recognition system
N Radha ... Jansi Rani Sella Velusami
Materials Today: Proceedings | VOL. 62
N Radha, et. al.N Radha ... Jansi Rani Sella Velusami
01 Jan 2021
Materials Today: Proceedings | VOL. 62

Improving Throat Microphone Speech Recognition by Joint Analysis of Throat and Acoustic Microphone Recordings
E Erzin
IEEE Transactions on Audio, Speech, and Language Processing | VOL. 17
E ErzinE Erzin
01 Sep 2009
IEEE Transactions on Audio, Speech, and Language Processing | VOL. 17

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An analysis of the effect of combining standard and alternate sensor signals on recognition of syllabic units for multimodal speech recognition

Abstract

Talk to us

Similar Papers

More From: Pattern Recognition Letters