Analysis of Closing-To-Opening Phase Ratio in Top-To-Bottom Glottal Pulse Segmentation for Psychological Stress Detection

Miroslav Stanek,Milan Sigmund

doi:10.5755/j01.eie.22.5.16348

Abstract

This paper is focused on investigating the differences in glottal pulses estimated by two algorithms; Direct Inverse Filtering (DIF) and Iterative and Adaptive Inverse Filtering (IAIF) for normal and stressed speech. Individual glottal pulses are mined from recorded speech signal and then normalized in two dimensions. Each normalized pulse is divided into a closing and opening phase and further segmented into n‑ percentage sectors in Top-To-Bottom (TTB) amplitude domain. Three parameters, the kurtosis, skewness and pulse area, as well as their Closing-To-Opening phase ratios, are analysed. Designed GMM classifier is trained on speakers from Czech ExamStress database a further applied on other part of ExamStress database and also for English database SUSAS to investigate the independency of presented approach on spoken language and speech signal quality. The results achieved by DIF indicate independency on language and records quality (contrary to methods using IAIF). The best n‑ percentage sectors in the TTB segments can be seen between 5 % and 40 %. In this case, methods based on DIF reached a psychological stress recognition efficiency of 88.5 % in average. The average stress detection efficiency of methods based on IAIF approached 73.3 %. DOI: http://dx.doi.org/10.5755/j01.eie.22.5.16348

Highlights

Current trend is to monitor the actual emotional state of speaker by non-invasive methods like remote analysis of speech signal mostly for the employees of risk professions, e.g. pilots, rescuers, etc., to avoid some dangerous or unpleasant situations
Designed Gaussian Mixture Models (GMM) classifier is trained on speakers from Czech ExamStress database a further applied on other part of ExamStress database and for English database SUSAS to investigate the independency of presented approach on spoken language and speech signal quality
The highest average efficiency on the observed n-percentage intervals are reached by using the Direct Inverse Filtering (DIF) estimation method (88.5 %) which achieved higher ε by a significant 15.2 % compared to the Iterative and Adaptive Inverse Filtering (IAIF) estimation algorithm (73.3 %)

Summary

Introduction

Current trend is to monitor the actual emotional state of speaker by non-invasive methods like remote analysis of speech signal mostly for the employees of risk professions, e.g. pilots, rescuers, etc., to avoid some dangerous or unpleasant situations. Psychological stress can be classified as an emotion, the psychological state influences human behaviour and self-confidence. Due to this reason, it is appropriate to recognize the stress of a speaker immediately, especially in situations when the speaker’s behaviour is negatively influenced by distress. Many methods of stress detection exist and are based mostly on directly mined speech features like MFCC [1], pitch [2], formants [3], etc. Research described in this paper was financed by Czech Ministry of Education in frame of National Sustainability Program under grant LO1401.

Methods

Results

Conclusion