Using Hybrid HMM/DNN Embedding Extractor Models in Computational Paralinguistic Tasks.

Mercedes Vetráb,Gábor Gosztolya

doi:10.3390/s23115208

Abstract

The field of computational paralinguistics emerged from automatic speech processing, and it covers a wide range of tasks involving different phenomena present in human speech. It focuses on the non-verbal content of human speech, including tasks such as spoken emotion recognition, conflict intensity estimation and sleepiness detection from speech, showing straightforward application possibilities for remote monitoring with acoustic sensors. The two main technical issues present in computational paralinguistics are (1) handling varying-length utterances with traditional classifiers and (2) training models on relatively small corpora. In this study, we present a method that combines automatic speech recognition and paralinguistic approaches, which is able to handle both of these technical issues. That is, we trained a HMM/DNN hybrid acoustic model on a general ASR corpus, which was then used as a source of embeddings employed as features for several paralinguistic tasks. To convert the local embeddings into utterance-level features, we experimented with five different aggregation methods, namely mean, standard deviation, skewness, kurtosis and the ratio of non-zero activations. Our results show that the proposed feature extraction technique consistently outperforms the widely used x-vector method used as the baseline, independently of the actual paralinguistic task investigated. Furthermore, the aggregation techniques could be combined effectively as well, leading to further improvements depending on the task and the layer of the neural network serving as the source of the local embeddings. Overall, based on our experimental results, the proposed method can be considered as a competitive and resource-efficient approach for a wide range of computational paralinguistic tasks.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Sensors (Basel, Switzerland)	Publication Date: May 30, 2023
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Using Hybrid HMM/DNN Embedding Extractor Models in Computational Paralinguistic Tasks.

Abstract

Talk to us

Similar Papers

More From: Sensors (Basel, Switzerland)

Lead the way for us

Similar Papers

Posterior-thresholding feature extraction for paralinguistic speech classification
Gábor Gosztolya
Knowledge-Based Systems | VOL. 186
Gábor GosztolyaGábor Gosztolya
16 Aug 2019
Knowledge-Based Systems | VOL. 186

Prosody in Automatic Speech Processing
Anton Batliner ... Bernd Möbius
-
Anton Batliner, et. al.Anton Batliner ... Bernd Möbius
31 Dec 2020
31 Dec 2020

A feature selection-based speaker clustering method for paralinguistic tasks
Gábor Gosztolya ... László Tóth
Pattern Analysis and Applications | VOL. 21
Gábor Gosztolya, et. al.Gábor Gosztolya ... László Tóth
23 Mar 2017
Pattern Analysis and Applications | VOL. 21

Effects of Business Embedded & Traditional Training Models on Motivation
Syed Akif Hasan ... Muhammad Imtiaz Subhani
Journal of Economics and Behavioral Studies | VOL. 2
Syed Akif Hasan, et. al.Syed Akif Hasan ... Muhammad Imtiaz Subhani
15 May 2011
Journal of Economics and Behavioral Studies | VOL. 2

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Using Hybrid HMM/DNN Embedding Extractor Models in Computational Paralinguistic Tasks.

Abstract

Talk to us

Similar Papers

More From: Sensors (Basel, Switzerland)