Generative Adversarial Networks (GANs) for Audio-Visual Speech Recognition in Artificial Intelligence IoT

Yibo He,Kah Phooi Seng,Li Minn Ang

doi:10.3390/info14100575

Abstract

This paper proposes a novel multimodal generative adversarial network AVSR (multimodal AVSR GAN) architecture, to improve both the energy efficiency and the AVSR classification accuracy of artificial intelligence Internet of things (IoT) applications. The audio-visual speech recognition (AVSR) modality is a classical multimodal modality, which is commonly used in IoT and embedded systems. Examples of suitable IoT applications include in-cabin speech recognition systems for driving systems, AVSR in augmented reality environments, and interactive applications such as virtual aquariums. The application of multimodal sensor data for IoT applications requires efficient information processing, to meet the hardware constraints of IoT devices. The proposed multimodal AVSR GAN architecture is composed of a discriminator and a generator, each of which is a two-stream network, corresponding to the audio stream information and the visual stream information, respectively. To validate this approach, we used augmented data from well-known datasets (LRS2-Lip Reading Sentences 2 and LRS3) in the training process, and testing was performed using the original data. The research and experimental results showed that the proposed multimodal AVSR GAN architecture improved the AVSR classification accuracy. Furthermore, in this study, we discuss the domain of GANs and provide a concise summary of the proposed GANs.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Information	Publication Date: Oct 19, 2023
Citations: 7	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Generative Adversarial Networks (GANs) for Audio-Visual Speech Recognition in Artificial Intelligence IoT

Abstract

Talk to us

Similar Papers

More From: Information

Lead the way for us

Similar Papers

Multimodal Sensor-Input Architecture with Deep Learning for Audio-Visual Speech Recognition in Wild.
Yibo He ... Kah Phooi Seng
Sensors | VOL. 23
Yibo He, et. al.Yibo He ... Kah Phooi Seng
07 Feb 2023
Sensors | VOL. 23

Audio-Visual Speech and Gesture Recognition by Sensors of Mobile Devices.
Dmitry Ryumin ... Denis Ivanko
Sensors | VOL. 23
Dmitry Ryumin, et. al.Dmitry Ryumin ... Denis Ivanko
17 Feb 2023
Sensors | VOL. 23

A Review of Recent Advances on Deep Learning Methods for Audio-Visual Speech Recognition
Denis Ivanko ... Alexey Karpov
Mathematics | VOL. 11
Denis Ivanko, et. al.Denis Ivanko ... Alexey Karpov
12 Jun 2023
Mathematics | VOL. 11

Improved decision trees for multi-stream HMM-based audio-visual continuous speech recognition
Jing Huang ... Karthik Visweswariah
-
Jing Huang, et. al.Jing Huang ... Karthik Visweswariah
01 Dec 2009
01 Dec 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Generative Adversarial Networks (GANs) for Audio-Visual Speech Recognition in Artificial Intelligence IoT

Abstract

Talk to us

Similar Papers

More From: Information