A sticky HDP-HMM with application to speaker diarization

Emily B Fox,Alan S Willsky,Erik B Sudderth,Michael I Jordan

doi:10.1214/10-aoas395

Abstract

We consider the problem of speaker diarization, the problem of segmenting an audio recording of a meeting into temporal segments corresponding to individual speakers. The problem is rendered particularly difficult by the fact that we are not allowed to assume knowledge of the number of people participating in the meeting. To address this problem, we take a Bayesian nonparametric approach to speaker diarization that builds on the hierarchical Dirichlet process hidden Markov model (HDP-HMM) of Teh et al. [J. Amer. Statist. Assoc. 101 (2006) 1566–1581]. Although the basic HDP-HMM tends to over-segment the audio data—creating redundant states and rapidly switching among them—we describe an augmented HDP-HMM that provides effective control over the switching rate. We also show that this augmentation makes it possible to treat emission distributions nonparametrically. To scale the resulting architecture to realistic diarization problems, we develop a sampling algorithm that employs a truncated approximation of the Dirichlet process to jointly resample the full state sequence, greatly improving mixing rates. Working with a benchmark NIST data set, we show that our Bayesian nonparametric architecture yields state-of-the-art speaker diarization results.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: The Annals of Applied Statistics	Publication Date: Jun 1, 2011
Citations: 346	License type: implied-oa

R Discovery Prime

R Discovery Prime

A sticky HDP-HMM with application to speaker diarization

Abstract

Talk to us

Similar Papers

More From: The Annals of Applied Statistics

Lead the way for us

Similar Papers

A Doubly Hierarchical Dirichlet Process Hidden Markov Model with a Non-Ergodic Structure
Amir H Harati Nejad Torbati ... Joseph Picone
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 24
Amir H Harati Nejad Torbati, et. al.Amir H Harati Nejad Torbati ... Joseph Picone
01 Jan 2015
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 24

An HDP-HMM for systems with state persistence
Emily B Fox ... Alan S Willsky
-
Emily B Fox, et. al.Emily B Fox ... Alan S Willsky
01 Jan 2008
01 Jan 2008

Dual Sticky Hierarchical Dirichlet Process Hidden Markov Model and Its Application to Natural Language Description of Motions.
Weiming Hu ... Guodong Tian
IEEE transactions on pattern analysis and machine intelligence | VOL. 40
Weiming Hu, et. al.Weiming Hu ... Guodong Tian
25 Sep 2017
IEEE transactions on pattern analysis and machine intelligence | VOL. 40

Extracting Overtaking Segments by Unsupervised Clustering and Predicting Nonmotorized Vehicle’s Trajectory
Ailing Yin ... Xiaohong Chen
Journal of Advanced Transportation | VOL. 2022
Ailing Yin, et. al.Ailing Yin ... Xiaohong Chen
13 Apr 2022
Journal of Advanced Transportation | VOL. 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A sticky HDP-HMM with application to speaker diarization

Abstract

Talk to us

Similar Papers

More From: The Annals of Applied Statistics