The design for the wall street journal-based CSR corpus

Douglas B. Paul,Janet M. Baker

doi:10.21437/icslp.1992-277

Abstract

The DARPA Spoken Language System (SLS) community has long taken a leadership position in designing, implementing, and globally distributing significant speech corpora widely used for advancing speech recognition research. The Wall Street Journal (WSJ) CSR Corpus described here is the newest addition to this valuable set of resources. In contrast to previous corpora, the WSJ corpus will provide DARPA its first general-purpose English, large vocabulary, natural language, high perplexity, corpus containing significant quantities of both speech data (400 hrs.) and text data (47M words), thereby providing a means to integrate speech recognition and natural language processing in application domains with high potential practical value. This paper presents the motivating goals, acoustic data design, text processing steps, lexicons, and testing paradigms incorporated into the multi-faceted WSJ CSR Corpus.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

The design for the wall street journal-based CSR corpus

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

The design for the wall street journal-based CSR corpus
Douglas B Paul ... Janet M Baker
-
Douglas B Paul, et. al.Douglas B Paul ... Janet M Baker
01 Jan 1992
01 Jan 1992

Resources for evaluation of summarization techniques.
...
-
, et. al. ...
01 Jan 1998
01 Jan 1998

Normalized amplitude modulation features for large vocabulary noise-robust speech recognition
Vikramjit Mitra ... Horacio Franco
-
Vikramjit Mitra, et. al.Vikramjit Mitra ... Horacio Franco
01 Mar 2012
01 Mar 2012

Comparison of discriminative training criteria and optimization methods for speech recognition
Ralf Schlüter ... Hermann Ney
Speech Communication | VOL. 34
Ralf Schlüter, et. al.Ralf Schlüter ... Hermann Ney
04 Apr 2001
Speech Communication | VOL. 34

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The design for the wall street journal-based CSR corpus

Abstract

Talk to us

Similar Papers