Visual speech recognition using compact hypercomplex neural networks

Iason Ioannis Panagos,Giorgos Sfikas,Christophoros Nikou

doi:10.1016/j.patrec.2024.09.002

Abstract

Recent progress in visual speech recognition systems due to advances in deep learning and large-scale public datasets has led to impressive performance compared to human professionals. The potential applications of these systems in real-life scenarios are numerous and can greatly benefit the lives of many individuals. However, most of these systems are not designed with practicality in mind, requiring large-size models and powerful hardware, factors which limit their applicability in resource-constrained environments and other real-world tasks. In addition, few works focus on developing lightweight systems that can be deployed in such conditions. Considering these issues, we propose compact networks that take advantage of hypercomplex layers that utilize a sum of Kronecker products to reduce overall parameter demands and model sizes. We train and evaluate our proposed models on the largest public dataset for single word speech recognition for English. Our experiments show that high compression rates are achievable with a minimal accuracy drop, indicating the method’s potential for practical applications in lower-resource environments. Code and models are available at https://github.com/jpanagos/vsr_phm.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Visual speech recognition using compact hypercomplex neural networks

Abstract

Talk to us

Similar Papers

More From: Pattern Recognition Letters

Lead the way for us

Similar Papers

Appearance and shape-based hybrid visual feature extraction: toward audio–visual automatic speech recognition
Saswati Debnath ... Pinki Roy
Signal, Image and Video Processing | VOL. 15
Saswati Debnath, et. al.Saswati Debnath ... Pinki Roy
11 Jun 2020
Signal, Image and Video Processing | VOL. 15

End-to-End Sentence-Level Multi-View Lipreading Architecture with Spatial Attention Module Integrated Multiple CNNs and Cascaded Local Self-Attention-CTC
Sanghun Jeon ... Mun Sang Kim
Sensors | VOL. 22
Sanghun Jeon, et. al.Sanghun Jeon ... Mun Sang Kim
09 May 2022
Sensors | VOL. 22

Can We Read Speech Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech Recognition
Yuanhang Zhang ... Xilin Chen
-
Yuanhang Zhang, et. al.Yuanhang Zhang ... Xilin Chen
01 Nov 2020
01 Nov 2020

An Improved Visual Speech Recognition of Isolated Words using Combined Pixel and Geometric Features
N Radha ... A Nayeemulla Khan
Indian Journal of Science and Technology | VOL. 9
N Radha, et. al.N Radha ... A Nayeemulla Khan
24 Nov 2016
Indian Journal of Science and Technology | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Visual speech recognition using compact hypercomplex neural networks

Abstract

Talk to us

Similar Papers

More From: Pattern Recognition Letters