Intelligent speech recognition algorithm in multimedia visual interaction via BiLSTM and attention mechanism

Yican Feng

doi:10.1007/s00521-023-08959-2

Abstract

With the rapid development of information technology in modern society, the application of multimedia integration platform is more and more extensive. Speech recognition has become an important subject in the process of multimedia visual interaction. The accuracy of speech recognition is dependent on a number of elements, two of which are the acoustic characteristics of speech and the speech recognition model. Speech data is complex and changeable. Most methods only extract a single type of feature of the signal to represent the speech signal. This single feature cannot express the hidden information. And, the excellent speech recognition model can also better learn the characteristic speech information to improve performance. This work proposes a new method for speech recognition in multimedia visual interaction. First of all, this work considers the problem that a single feature cannot fully represent complex speech information. This paper proposes three kinds of feature fusion structures to extract speech information from different angles. This extracts three different fusion features based on the low-level features and higher-level sparse representation. Secondly, this work relies on the strong learning ability of neural network and the weight distribution mechanism of attention model. In this paper, the fusion feature is combined with the bidirectional long and short memory network with attention. The extracted fusion features contain more speech information with strong discrimination. When the weight increases, it can further improve the influence of features on the predicted value and improve the performance. Finally, this paper has carried out systematic experiments on the proposed method, and the results verify the feasibility.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Neural Computing and Applications	Publication Date: Nov 29, 2023
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Intelligent speech recognition algorithm in multimedia visual interaction via BiLSTM and attention mechanism

Abstract

Talk to us

Similar Papers

More From: Neural Computing and Applications

Lead the way for us

Similar Papers

A new speech corpus of super-elderly Japanese for acoustic modeling
Meiko Fukuda ... Norihide Kitaoka
Computer Speech & Language | VOL. 77
Meiko Fukuda, et. al.Meiko Fukuda ... Norihide Kitaoka
24 Jun 2022
Computer Speech & Language | VOL. 77

Improving Speech Recognition Performance in Noisy Environments by Enhancing Lip Reading Accuracy.
Dengshi Li ... Yu Gao
Sensors | VOL. 23
Dengshi Li, et. al.Dengshi Li ... Yu Gao
11 Feb 2023
Sensors | VOL. 23

Intelligent Robot English Speech Recognition Method Based on Online Database
Yong Wu ... Guicang Li
Journal of Information & Knowledge Management | VOL. 21
Yong Wu, et. al.Yong Wu ... Guicang Li
17 May 2022
Journal of Information & Knowledge Management | VOL. 21

Study of Artificial Intelligence Flight Co-Pilot Speech Recognition Technology
Lin Wei ... Liqing He
-
Lin Wei, et. al.Lin Wei ... Liqing He
14 Oct 2020
14 Oct 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Intelligent speech recognition algorithm in multimedia visual interaction via BiLSTM and attention mechanism

Abstract

Talk to us

Similar Papers

More From: Neural Computing and Applications