Abstract

Real stressed speech is affected by various aspects (individual characteristics and environment) so that the stress patterns are diverse and different on each individual. To this end, in our previous work, we performed an unsupervised clustering method that able to self-learning manner by mapping the feature representations of the stress speech and clustering tasks simultaneously, called deep time-delay embedded clustering (DTEC). However, DTEC has not confirmed yet the compatibility between the output class and informational classes. Therefore, we proposed semi-supervised time-delay embedded clustering (SDTEC) as a new framework of semi-supervised in DTEC. SDTEC incorporates the prior information of pairwise constraints in the embedding layer and simultaneously learns the feature representation and the clustering assignments. The prior information was used to guide the clustering procedure so that the points that belong to the incorrect cluster can be corrected. The effectiveness of the proposed SDTEC was evaluated by comparing it with some baseline methods in terms of the clustering error rate (CER). Moreover, to demonstrate SDTEC’s capabilities, we conducted a comprehensive ablation study. Based on experiment results, SDTEC outperformed the baseline methods and achieves state-of-the-art results in semi-supervised clustering.

Highlights

  • IntroductionStress is an unconscious emotion caused by environmental stimuli [1]

  • In psychological sciences, stress is an unconscious emotion caused by environmental stimuli [1].The human body responds to stress by releasing hormones that increase heart rates, breathing rates, and muscle tension [2]

  • We assess the effectiveness of the proposed supervised deep time-delay embedded clustering (SDTEC) in categorizing the stress speech data of Speech Under Simulated and Actual Stress (SUSAS) dataset in term of clustering error rate (CER)

Read more

Summary

Introduction

Stress is an unconscious emotion caused by environmental stimuli [1]. In real situations, stress characteristics are diverse and have different patterns for each individual due to various aspects such as characteristics, gender, experience background, and emotional tendencies [7] In this decade, unsupervised clustering has been explored by defining an effective objective in a self-learning manner to categorize stress speech data [8,9,10]. In our previous work [20], we proposed a new deep clustering architecture that uses the time-delay neural network (TDNN) structure to built the autoencoder. We named it the deep time-delay embedded clustering (DTEC).

Related Works
Semi-Supervised Deep Time-Delay Embedded Clustering
Nonlinear Transformation
Stress Speech Recognition Model Based Pairwise Constraints
Objective Function of the Network
Dataset
Experiment Settings
Baseline Clustering Methods
Results and Discussion
Evaluation Result
Method
Ablation Study
The Effect of Losses
The Effect of the Number of Constraints
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.