Abstract

To recognize stress and emotion, most of the existing methods only observe and analyze speech patterns from present-time features. However, an emotion (especially for stress) can change because it was triggered by an event while speaking. To address this issue, we propose a novel method for predicting stress and emotions by analyzing prior emotional states. We named this method the deep time-delay Markov network (DTMN). Structurally, the proposed DTMN contains a hidden Markov model (HMM) and a time-delay neural network (TDNN). We evaluated the effectiveness of the proposed DTMN by comparing it with several state transition methods in predicting an emotional state from time-series (sequences) speech data of the SUSAS dataset. The experimental results show that the proposed DTMN can accurately predict present emotional states by outperforming the baseline systems in terms of the prediction error rate (PER). We then modeled the emotional state transition using a finite Markov chain based on the prediction result. We also conducted an ablation experiment to observe the effect of different HMM values and TDNN parameters on the prediction result and the computational training time of the proposed DTMN.

Highlights

  • To recognize stress and emotion, most of the existing methods only observe and analyze speech patterns from present-time features

  • The proposed DTMN structurally consists of a Markov model that is denoted by the hidden Markov model (HMM) and a neural network that is represented by the time-delay neural network (TDNN)

  • The proposed DTMN consisted of a hidden Markov model (HMM) and the time-delay neural network or TDNN

Read more

Summary

Introduction

To recognize stress and emotion, most of the existing methods only observe and analyze speech patterns from present-time features. An emotion (especially for stress) can change because it was triggered by an event while speaking To address this issue, we propose a novel method for predicting stress and emotions by analyzing prior emotional states. We propose a novel method for predicting stress and emotions by analyzing prior emotional states We named this method the deep time-delay Markov network (DTMN). To this end, many researchers use clustering algorithms to categorize unlabeled stressed speech data based on the similarity of their characteristics. Emotion (especially stress) may change when triggered by an event while s­ peaking[23] In this fashion, we argue that the prior emotional states should be monitored so that the emotion of the speaker can be recognized more accurately. We can take advantage of larger sets of contextual i­nformation[24]

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call