Improving the Learning Power of Artificial Intelligence Using Multimodal Deep Learning

Eugene Yu Shchetinin,Leonid Sevastianov

doi:10.1051/epjconf/202124801017

Abstract

Computer paralinguistic analysis is widely used in security systems, biometric research, call centers and banks. Paralinguistic models estimate different physical properties of voice, such as pitch, intensity, formants and harmonics to classify emotions. The main goal is to find such features that would be robust to outliers and will retain variety of human voice properties at the same time. Moreover, the model used must be able to estimate features on a time scale for an effective analysis of voice variability. In this paper a paralinguistic model based on Bidirectional Long Short-Term Memory (BLSTM) neural network is described, which was trained for vocal-based emotion recognition. The main advantage of this network architecture is that each module of the network consists of several interconnected layers, providing the ability to recognize flexible long-term dependencies in data, which is important in context of vocal analysis. We explain the architecture of a bidirectional neural network model, its main advantages over regular neural networks and compare experimental results of BLSTM network with other models.

Highlights

Paralinguistics is a branch of linguistics, main scope of which is analyzing of non-verbal aspects of language – such as tempo, pitch, tone and intonation
In this paper a paralinguistic model based on Bidirectional Long ShortTerm Memory (BLSTM) neural network is described, which was trained for vocal-based emotion recognition
In this paper we presented a Bidirectional Long Short-Term Memory (BLSTM) neural network in application to voice-based emotion classification

Summary

Introduction

Paralinguistics is a branch of linguistics, main scope of which is analyzing of non-verbal aspects of language – such as tempo, pitch, tone and intonation. Different models, which estimate different physical properties of voice, such as pitch, volume and harmonics, are used Such classifiers usually serve as parts of security systems, mobile assistants and biometric research algorithms [3,4]. The model used must be able to estimate features on a time scale for an effective analysis of voice variability This is done by extracting features based on a sliding-window scheme, which solves the problem of data normalization and helps to reduce overfitting [6-9]. Another challenge is absence of uniform standard of human emotion types.

Bidirectional ne ural ne twork architecture

Forget gate

Vanishing gradient problem

Feature extraction and dataset

Experiment results

Findings

Conclusion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Improving the Learning Power of Artificial Intelligence Using Multimodal Deep Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EPJ web of conferences

Lead the way for us

Journal: EPJ web of conferences	Publication Date: Jan 1, 2021
License type: CC BY 4.0

Similar Papers

Enhancing Electrical Load Prediction Using a Bidirectional LSTM Neural Network
Christos Pavlatos ... Vasiliki Vita
Electronics | VOL. 12
Christos Pavlatos, et. al.Christos Pavlatos ... Vasiliki Vita
15 Nov 2023
Electronics | VOL. 12

Unsupervised Domain Adaptation based Remaining Useful Life Prediction of Rolling Element Bearings
Chenyu Liu ... Konstantinos Gryllias
PHM Society European Conference | VOL. 5
Chenyu Liu, et. al.Chenyu Liu ... Konstantinos Gryllias
22 Jul 2020
PHM Society European Conference | VOL. 5

Bidirectional recurrent neural network language models for automatic speech recognition
Ebru Arisoy ... Abhinav Sethy
-
Ebru Arisoy, et. al.Ebru Arisoy ... Abhinav Sethy
01 Apr 2015
01 Apr 2015

Evaluation of Source Codes Using Bidirectional LSTM Neural Network
Md Mostafizer Rahman ... Yutaka Watanobe
-
Md Mostafizer Rahman, et. al.Md Mostafizer Rahman ... Yutaka Watanobe
21 Aug 2020
21 Aug 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improving the Learning Power of Artificial Intelligence Using Multimodal Deep Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EPJ web of conferences