Acoustic Segment Modeling with Spectral Clustering Methods

Haipeng Wang,Haizhou Li,Bin Ma,Cheung-Chi Leung,Tan Lee

doi:10.1109/taslp.2014.2387382

Abstract

This paper presents a study of spectral clustering-based approaches to acoustic segment modeling (ASM). ASM aims at finding the underlying phoneme-like speech units and building the corresponding acoustic models in the unsupervised setting, where no prior linguistic knowledge and manual transcriptions are available. A typical ASM process involves three stages, namely initial segmentation, segment labeling, and iterative modeling. This work focuses on the improvement of segment labeling. Specifically, we use posterior features as the segment representations, and apply spectral clustering algorithms on the posterior representations. We propose a Gaussian component clustering (GCC) approach and a segment clustering (SC) approach. GCC applies spectral clustering on a set of Gaussian components, and SC applies spectral clustering on a large number of speech segments. Moreover, to exploit the complementary information of different posterior representations, a multiview segment clustering (MSC) approach is proposed. MSC simultaneously utilizes multiple posterior representations to cluster speech segments. To address the computational problem of spectral clustering in dealing with large numbers of speech segments, we use inner product similarity graph and make reformulations to avoid the explicit computation of the affinity matrix and Laplacian matrix. We carried out two sets of experiments for evaluation. First, we evaluated the ASM accuracy on the OGI-MTS dataset, and it was shown that our approach could yield 18.7% relative purity improvement and 15.1% relative NMI improvement compared with the baseline approach. Second, we examined the performances of our approaches in the real application of zero-resource query-by-example spoken term detection on SWS2012 dataset, and it was shown that our approaches could provide consistent improvement on four different testing scenarios with three evaluation metrics.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Acoustic Segment Modeling with Spectral Clustering Methods

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing

Lead the way for us

Journal: IEEE/ACM Transactions on Audio, Speech, and Language Processing	Publication Date: Feb 1, 2015
Citations: 59

Similar Papers

Direct posterior confidence for out-of-vocabulary spoken term detection
Dong Wang ... Simon King
-
Dong Wang, et. al.Dong Wang ... Simon King
29 Oct 2010
29 Oct 2010

Representation Learning for Spoken Term Detection
P Raghavendra Reddy ... B Yegnanarayana
-
P Raghavendra Reddy, et. al.P Raghavendra Reddy ... B Yegnanarayana
07 Dec 2016
07 Dec 2016

Spectral representation learning for one-step spectral rotation clustering
Guoqiu Wen ... Wei Zheng
Neurocomputing | VOL. 406
Guoqiu Wen, et. al.Guoqiu Wen ... Wei Zheng
12 Mar 2020
Neurocomputing | VOL. 406

A comparison of grapheme and phoneme-based units for Spanish spoken term detection
Javier Tejedor ... José Colás
Speech Communication | VOL. 50
Javier Tejedor, et. al.Javier Tejedor ... José Colás
28 Mar 2008
Speech Communication | VOL. 50

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Acoustic Segment Modeling with Spectral Clustering Methods

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing