Exemplar-Based Sparse Representation Features: From TIMIT to LVCSR

Tara N Sainath,Bhuvana Ramabhadran,Dimitri Kanevsky,Michael Picheny,David Nahamoo

doi:10.1109/tasl.2011.2155060

Abstract

The use of exemplar-based methods, such as support vector machines (SVMs), k-nearest neighbors (kNNs) and sparse representations (SRs), in speech recognition has thus far been limited. Exemplar-based techniques utilize information about individual training examples and are computationally expensive, making it particularly difficult to investigate these methods on large-vocabulary continuous speech recognition (LVCSR) tasks. While research in LVCSR provides a good testbed to tackle real-world speech recognition problems, research in this area suffers from two main drawbacks. First, the overall complexity of an LVCSR system makes error analysis quite difficult. Second, exploring new research ideas on LVCSR tasks involves training and testing state-of-the-art LVCSR systems, which can render a large turnaround time. This makes a small vocabulary task such as TIMIT more appealing. TIMIT provides a phonetically rich and hand-labeled corpus that allows easy insight into new algorithms. However, research ideas explored for small vocabulary tasks do not always provide gains on LVCSR systems. In this paper, we combine the advantages of using both small and large vocabulary tasks by taking well-established techniques used in LVCSR systems and applying them on TIMIT to establish a new baseline. We then utilize these existing LVCSR techniques in creating a novel set of exemplar-based sparse representation (SR) features. Using these existing LVCSR techniques, we achieve a phonetic error rate (PER) of 19.4% on the TIMIT task. The additional use of SR features reduce the PER to 18.6%. We then explore applying the SR features to a large vocabulary Broadcast News task, where we achieve a 0.3% absolute reduction in word error rate (WER).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Exemplar-Based Sparse Representation Features: From TIMIT to LVCSR

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Audio, Speech, and Language Processing

Lead the way for us

Journal: IEEE Transactions on Audio, Speech, and Language Processing	Publication Date: Nov 1, 2011
Citations: 104

Similar Papers

Quantifying the value of pronunciation lexicons for keyword search in lowresource languages
Guoguo Chen ... Oguz Yilmaz
-
Guoguo Chen, et. al.Guoguo Chen ... Oguz Yilmaz
01 May 2013
01 May 2013

An exploration of large vocabulary tools for small vocabulary phonetic recognition
Tara N Sainath ... Michael Picheny
-
Tara N Sainath, et. al.Tara N Sainath ... Michael Picheny
01 Dec 2009
01 Dec 2009

Acoustic models of the elderly for large‐vocabulary continuous speech recognition
Akira Baba ... Shinichi Yoshizawa
Electronics and Communications in Japan (Part II: Electronics) | VOL. 87
Akira Baba, et. al.Akira Baba ... Shinichi Yoshizawa
09 Jun 2004
Electronics and Communications in Japan (Part II: Electronics) | VOL. 87

Recent improvements of the SpeeD Romanian LVCSR system
Horia Cucu ... Corneliu Burileanu
-
Horia Cucu, et. al.Horia Cucu ... Corneliu Burileanu
01 May 2014
01 May 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Exemplar-Based Sparse Representation Features: From TIMIT to LVCSR

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Audio, Speech, and Language Processing