Noise Robust Exemplar Matching Using Sparse Representations of Speech

Emre Yilmaz,Jort Florent Gemmeke,Hugo Van Hamme

doi:10.1109/taslp.2014.2329188

Abstract

Performing automatic speech recognition using exemplars (templates) holds the promise to provide a better duration and coarticulation modeling compared to conventional approaches such as hidden Markov models (HMMs). Exemplars are spectrographic representations of speech segments extracted from the training data, each associated with a speech unit, e.g. phones, syllables, half-words or words, and preserve the complete spectro-temporal content of the speech. Conventional exemplar-matching approaches to automatic speech recognition systems, such as those based on dynamic time warping, have typically focused on evaluation in clean conditions. In this paper, we propose a novel noise robust exemplar matching framework for automatic speech recognition. This recognizer approximates noisy speech segments as a weighted sum of speech and noise exemplars and performs recognition by comparing the reconstruction errors of different classes with respect to a divergence measure. We evaluate the system performance in keyword recognition on the small vocabulary track of the 2nd CHiME Challenge and connected digit recognition on the AURORA-2 database. The results show that the proposed system achieves comparable results with state-of-the-art noise robust recognition systems.

Full Text