Design of mixture of GMMs for Query-by-Example Spoken Term Detection

Maulik C Madhavi,Hemant A Patil

doi:10.1016/j.csl.2018.04.006

Abstract

This paper presents the design of a mixture of Gaussian Mixture Models (GMMs) for Query-by-Example Spoken Term Detection (QbE-STD). The speech data governs acoustically similar broad phonetic structures. To capture broad phonetic structure, we exploit additional information of broad phoneme classes (such as vowels, semi-vowels, nasals, fricatives, and plosives) for the training of the GMM. The mixture of GMMs is tied with GMMs of these broad phoneme classes, i.e., each GMM expresses the probability density function (pdf) of a broad phoneme category. The Expectation Maximization (EM) algorithm is used to obtain the GMM for each broad phoneme class. Thus, a mixture of GMMs represents the spoken query with the broad phonetic constraints. These constraints restrict the posterior probability within the broad class, which results into a better posteriorgram design. The novelty of our work lies in prior probability assignments (as weights of the mixture of GMMs) for better Gaussian posteriorgram design. The proposed simple yet effective posteriorgram outperform Gaussian posteriorgram because of its implicit constraints supplied by broad phonetic posteriors. The Maximum Term Weighted Value (MTWV) for SWS 2013 dataset is improved by 0.052, and 0.051 w.r.t. Gaussian posteriorgram for Mel Frequency Cepstral Coefficients (MFCC) and Perceptual Linear Prediction (PLP), respectively. We found that the proposed mixture of GMMs approach gave consistently better performance than the Gaussian posteriorgram across various evaluation factors, such as different cepstral representations, number of Gaussian components, the number of spoken examples per query, and effect of amount of labeled data used for broad phoneme posterior computation.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Design of mixture of GMMs for Query-by-Example Spoken Term Detection

Abstract

Talk to us

Similar Papers

More From: Computer Speech & Language

Lead the way for us

Journal: Computer Speech & Language	Publication Date: May 4, 2018
Citations: 5

Similar Papers

Vocal Tract Length Normalization using a Gaussian mixture model framework for query-by-example spoken term detection
Maulik C Madhavi ... Hemant A Patil
Computer Speech & Language | VOL. 58
Maulik C Madhavi, et. al.Maulik C Madhavi ... Hemant A Patil
23 Mar 2019
Computer Speech & Language | VOL. 58

VTLN-warped Gaussian posteriorgram for QbE-STD
Maulik C Madhavi ... Hemant A Patil
-
Maulik C Madhavi, et. al.Maulik C Madhavi ... Hemant A Patil
01 Aug 2017
01 Aug 2017

Spoken Term Detection Techniques
Leena Mary ... Deekshitha G
-
Leena Mary, et. al.Leena Mary ... Deekshitha G
26 Sep 2018
26 Sep 2018

Query-by-Example Spoken Term Detection using Frequency Domain Linear Prediction and Non-Segmental Dynamic Time Warping
Gautam Mantena ... Kishore Prahallad
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 22
Gautam Mantena, et. al.Gautam Mantena ... Kishore Prahallad
01 May 2014
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Design of mixture of GMMs for Query-by-Example Spoken Term Detection

Abstract

Talk to us

Similar Papers

More From: Computer Speech &amp; Language

More From: Computer Speech & Language