Abstract

The power of speech is a main tool in human communication. There are a lot of factors as age, emotions, gender, pitch of the voice which can influence features of speech. Obviously, information conveyed by voice intonation has more than only textual meaning. The same sentence pronounced in two different ways can have two completely different meanings. This paper describes Kohonen networks as a classifier of Polish emotional speech. The usage of Discrete Wavelet Transform (DWT) as well as an innovative approach to scaleogram processing is also presented in this article. Mexican Hat Wavelet and the Haar Wavelet were used in researches. All simulations were carried out in MatLab 2016 with Neural Network Toolbar. During whole research more than 9000 simulation have been done. Three different speech databases were used in conducted researches. One of them was prepared by professional actors – four women and four men, and contains 240 wav files. Two others are results of researchers works. The structures of used Kohonen networks depend on speech signal decomposition’s level and scaleogram division. During conducted researches the following emotional states were considered: anger, joy, sadness, boredom, fear and neutral state. Achieved results were between 68% and 80% depends of used wavelet, speech signal and signal decomposition’s level.

Highlights

  • Recognition of speaker's emotional state based on speech signal processing is relatively new issue but its significance has been rapidly increasing

  • One of the reasons of such a direction of changes is burgeoning development of systems based on Brain-Computer Interface, as well as Virtual Reality (VR) environments [1]

  • The most complicated issue in Polish emotional speech recognition is the number of emotional states which should be detected

Read more

Summary

Introduction

Recognition of speaker's emotional state based on speech signal processing is relatively new issue but its significance has been rapidly increasing. As it can be seen, the appearance of a particular emotion changed ranging from frequency and to the shape of oscillogram. An innovative system of Polish emotional speech signal processing has been described. The third part included the description of signal processing algorithm, research methods, and parameters for the Polish emotional speech and presented obtained results and suggestions for improving the adapted research methods

Analysis of issues
Description of used databases
Wavelet Transformation
Speech signal processing scheme
Kohonen Networks Architecture and achieved results
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call