IMPROVED AUTOMATIC LIP-READING BASED ON THE EVALUATION OF INTENSITY LEVEL OF SPEAKER’S EMOTION

D Ivanko,D Ryumin,E Ryumina

doi:10.5194/isprs-archives-xlviii-2-w3-2023-89-2023

Abstract

Abstract. Automatic audio-visual speech recognition systems (AVSRs) have recently achieved tremendous success, especially in limited vocabulary tasks by far surpassing human abilities to recognize speech, especially in acoustically noisy conditions. Automatic speech recognition systems based on processing of audio and video information are being actively researched and developed all over the world. However, scientific studies aimed at analyzing the influence of the speaker's emotional state (anger, disgust, fear, happy, neutral, and sad), and, most importantly, intensity level of emotion (low - LO, medium - MD, high - HI) on automatic lip-reading have not been conducted. In this regard, the relevance of this research topic cannot be overestimated and requires detailed study. In this paper, we present a novel approach for emotional speech lip-reading, that includes evaluation of a speaker’s emotion and its intensity level. The proposed approach uses visual speech data to detect a person’s emotion type and its intensity level and based on this information assigns it to one of the trained emotional lip-reading models. This essentially resolves the multi-emotional lip-reading issue associated with most real-life scenarios. The proposed approach improves the state-of-the-art results due to the consideration of the intensity of the pronounced audio-visual speech up to 8.2% in terms of the accuracy. Current research is the first step in the creation of emotion-robust speech recognition systems and leaves open a wide field for further research.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

IMPROVED AUTOMATIC LIP-READING BASED ON THE EVALUATION OF INTENSITY LEVEL OF SPEAKER’S EMOTION

Abstract

Talk to us

Similar Papers

More From: ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences

Lead the way for us

Journal: ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences	Publication Date: May 12, 2023
License type: CC BY 4.0

Similar Papers

Combined speech enhancement and auditory modelling for robust distributed speech recognition
Ronan Flynn ... Edward Jones
Speech Communication | VOL. 50
Ronan Flynn, et. al.Ronan Flynn ... Edward Jones
20 May 2008
Speech Communication | VOL. 50

Interaction between people with dysarthria and speech recognition systems: A review
Aisha Jaddoh ... Omer Rana
Assistive Technology | VOL. 35
Aisha Jaddoh, et. al.Aisha Jaddoh ... Omer Rana
16 Apr 2022
Assistive Technology | VOL. 35

Support software for Automatic Speech Recognition systems targeted for non-native speech
Kacper Radzikowski ... Robert Nowak
-
Kacper Radzikowski, et. al.Kacper Radzikowski ... Robert Nowak
30 Nov 2020
30 Nov 2020

Theoretical Analysis of Diversity in an Ensemble of Automatic Speech Recognition Systems
Kartik Audhkhasi ... Andreas M Zavou
IEEE/ACM transactions on audio, speech, and language processing | VOL. 22
Kartik Audhkhasi, et. al.Kartik Audhkhasi ... Andreas M Zavou
01 Mar 2014
IEEE/ACM transactions on audio, speech, and language processing | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

IMPROVED AUTOMATIC LIP-READING BASED ON THE EVALUATION OF INTENSITY LEVEL OF SPEAKER’S EMOTION

Abstract

Talk to us

Similar Papers

More From: ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences