Unsupervised learning of time–frequency patches as a noise-robust representation of speech

Maarten Van Segbroeck,Hugo Van Hamme

doi:10.1016/j.specom.2009.05.003

Abstract

We present a self-learning algorithm using a bottom-up based approach to automatically discover, acquire and recognize the words of a language. First, an unsupervised technique using non-negative matrix factorization (NMF) discovers phone-sized time–frequency patches into which speech can be decomposed. The input matrix for the NMF is constructed for static and dynamic speech features using a spectral representation of both short and long acoustic events. By describing speech in terms of the discovered time–frequency patches, patch activations are obtained which express to what extent each patch is present across time. We then show that speaker-independent patterns appear to recur in these patch activations and how they can be discovered by applying a second NMF-based algorithm on the co-occurrence counts of activation events. By providing information about the word identity to the learning algorithm, the retrieved patterns can be associated with meaningful objects of the language. In case of a small vocabulary task, the system is able to learn patterns corresponding to words and subsequently detects the presence of these words in speech utterances. Without the prior requirement of expert knowledge about the speech as is the case in conventional automatic speech recognition, we illustrate that the learning algorithm achieves a promising accuracy and noise robustness.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Unsupervised learning of time–frequency patches as a noise-robust representation of speech

Abstract

Talk to us

Similar Papers

More From: Speech Communication

Lead the way for us

Journal: Speech Communication	Publication Date: May 18, 2009
Citations: 47

Similar Papers

Advances in Nonnegative Matrix and Tensor Factorization
A Cichocki ... P Smaragdis
Computational Intelligence and Neuroscience | VOL. 2008
A Cichocki, et. al.A Cichocki ... P Smaragdis
01 Jan 2008
Computational Intelligence and Neuroscience | VOL. 2008

A theory on optimal construction of dynamic features of speech for HMM-based speech recognition
Li Deng
-
Li Deng Li Deng
13 Apr 1994
13 Apr 1994

A Novel Enhanced Nonnegative Matrix Factorization Method for Face Recognition
Wen-Sheng Chen ... Binbin Pan
International Journal of Pattern Recognition and Artificial Intelligence | VOL. 36
Wen-Sheng Chen, et. al.Wen-Sheng Chen ... Binbin Pan
14 Mar 2022
International Journal of Pattern Recognition and Artificial Intelligence | VOL. 36

Unsupervised detection of non-stationary segments based on single-basis non-negative matrix factorization for effective annotation
Thanh Thi Hien Duong ... Nobutaka Ono
-
Thanh Thi Hien Duong, et. al.Thanh Thi Hien Duong ... Nobutaka Ono
01 Dec 2016
01 Dec 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Unsupervised learning of time–frequency patches as a noise-robust representation of speech

Abstract

Talk to us

Similar Papers

More From: Speech Communication