Unsupervised speech representation learning for behavior modeling using triplet enhanced contextualized networks

Haoqi Li,Brian Baucom,Shrikanth Narayanan,Panayiotis Georgiou

doi:10.1016/j.csl.2021.101226

Haoqi Li, Brian Baucom + Show 2 more

Open Access

https://doi.org/10.1016/j.csl.2021.101226

Copy DOI

Abstract

Speech encodes a wealth of information related to human behavior and has been used in a variety of automated behavior recognition tasks. However, extracting behavioral information from speech remains challenging including due to inadequate training data resources stemming from the often low occurrence frequencies of specific behavioral patterns. Moreover, supervised behavioral modeling typically relies on domain-specific construct definitions and corresponding manually-annotated data, rendering generalizing across domains challenging. In this paper, we exploit the stationary properties of human behavior within an interaction and present a representation learning method to capture behavioral information from speech in an unsupervised way. We hypothesize that nearby segments of speech share the same behavioral context and hence map onto similar underlying behavioral representations. We present an encoder-decoder based Deep Contextualized Network (DCN) as well as a Triplet-Enhanced DCN (TE-DCN) framework to capture the behavioral context and derive a manifold representation, where speech frames with similar behaviors are closer while frames of different behaviors maintain larger distances. The models are trained on movie audio data and validated on diverse domains including on a couples therapy corpus and other publicly collected data (e.g., stand-up comedy). With encouraging results, our proposed framework shows the feasibility of unsupervised learning within cross-domain behavioral modeling.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Computer Speech & Language	Publication Date: Apr 22, 2021
Citations: 1	License type: publisher-specific-oa

R Discovery Prime

R Discovery Prime

Unsupervised speech representation learning for behavior modeling using triplet enhanced contextualized networks

Abstract

Talk to us

Similar Papers

More From: Computer Speech & Language

Lead the way for us

Similar Papers

Functional and Behavioral Representation of Product Information for Collaboration in Product Lifecycle
Mehmet Murat Baysal ... Mehmet I Sarigecili
-
Mehmet Murat Baysal, et. al.Mehmet Murat Baysal ... Mehmet I Sarigecili
01 Jan 2007
01 Jan 2007

A Knowledge Repository for Behavioral Models in Engineering Design
Gregory Mocko ... Russell Peak
-
Gregory Mocko, et. al.Gregory Mocko ... Russell Peak
01 Jan 2004
01 Jan 2004

Intelligent Analysis Model of Behavior Decision Based on EEG Physiological Information
Xiashuang Wang ...
-
Xiashuang Wang, et. al.Xiashuang Wang ...
07 Oct 2021
07 Oct 2021

Research on Virtual Character Behavior Simulation and Control Algorithm in Digital Movie Production
Qinfan Cao
Applied Mathematics and Nonlinear Sciences | VOL. 9
Qinfan CaoQinfan Cao
01 Jan 2024
Applied Mathematics and Nonlinear Sciences | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Unsupervised speech representation learning for behavior modeling using triplet enhanced contextualized networks

Abstract

Talk to us

Similar Papers

More From: Computer Speech &amp; Language

More From: Computer Speech & Language