Combining Spectral Representations for Large-Vocabulary Continuous Speech Recognition

G Garau,S Renals

doi:10.1109/tasl.2008.916519

Abstract

In this paper, we investigate the combination of complementary acoustic feature streams in large-vocabulary continuous speech recognition (LVCSR). We have explored the use of acoustic features obtained using a pitch-synchronous analysis, Straight, in combination with conventional features such as Mel frequency cepstral coefficients. Pitch-synchronous acoustic features are of particular interest when used with vocal tract length normalization (VTLN) which is known to be affected by the fundamental frequency. We have combined these spectral representations directly at the acoustic feature level using heteroscedastic linear discriminant analysis (HLDA) and at the system level using ROVER. We evaluated this approach on three LVCSR tasks: dictated newspaper text (WSJCAM0), conversational telephone speech (CTS), and multiparty meeting transcription. The CTS and meeting transcription experiments were both evaluated using standard NIST test sets and evaluation protocols. Our results indicate that combining conventional and pitch-synchronous acoustic feature sets using HLDA results in a consistent, significant decrease in word error rate across all three tasks. Combining at the system level using ROVER resulted in a further significant decrease in word error rate.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Transactions on Audio, Speech, and Language Processing	Publication Date: Mar 1, 2008
Citations: 72	License type: other-oa

R Discovery Prime

R Discovery Prime

Combining Spectral Representations for Large-Vocabulary Continuous Speech Recognition

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Audio, Speech, and Language Processing

Lead the way for us

Similar Papers

A new perceptually motivated MVDR-based acoustic front-end (PMVDR) for robust automatic speech recognition
Umit H Yapanel ... John H.L Hansen
Speech Communication | VOL. 50
Umit H Yapanel, et. al.Umit H Yapanel ... John H.L Hansen
19 Sep 2007
Speech Communication | VOL. 50

A hybrid approach to compounds in LVCSR
Tom Laureys ... Vincent Vandeghinste
-
Tom Laureys, et. al.Tom Laureys ... Vincent Vandeghinste
16 Sep 2002
16 Sep 2002

Exemplar-Based Sparse Representation Features: From TIMIT to LVCSR
Tara N Sainath ... David Nahamoo
IEEE Transactions on Audio, Speech, and Language Processing | VOL. 19
Tara N Sainath, et. al.Tara N Sainath ... David Nahamoo
01 Nov 2011
IEEE Transactions on Audio, Speech, and Language Processing | VOL. 19

Challenges and Techniques for Dialectal Arabic Speech Recognition and Machine Translation
Mohamed Elmahdy
Qatar Foundation Annual Research Forum Proceedings | VOL. 2011
Mohamed ElmahdyMohamed Elmahdy
01 Nov 2011
Qatar Foundation Annual Research Forum Proceedings | VOL. 2011

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Combining Spectral Representations for Large-Vocabulary Continuous Speech Recognition

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Audio, Speech, and Language Processing