SVitchboard-II and FiSVer-I: Crafting high quality and low complexity conversational english speech corpora using submodular function optimization

Yuzong Liu,Rishabh Iyer,Katrin Kirchhoff,Jeff Bilmes

doi:10.1016/j.csl.2016.10.002

Abstract

We introduce a set of benchmark corpora of conversational English speech derived from the Switchboard-I and Fisher datasets. Traditional automatic speech recognition (ASR) research requires considerable computational resources and has slow experimental turnaround times. Our goal is to introduce these new datasets to researchers in the ASR and machine learning communities in order to facilitate the development of novel speech recognition techniques on smaller but still acoustically rich, diverse, and hence interesting corpora. We select these corpora to maximize an acoustic quality criterion while limiting the vocabulary size (from 10 words up to 10,000 words), where both “acoustic quality” and vocabulary size are adeptly measured via various submodular functions. We also survey numerous submodular functions that could be useful to measure both “acoustic quality” and “corpus complexity” and offer guidelines on when and why a scientist may wish use to one vs. another. The corpora selection process itself is naturally performed using various state-of-the-art submodular function optimization procedures, including submodular level-set constrained submodular optimization (SCSC/SCSK), difference-of-submodular (DS) optimization, and unconstrained submodular minimization (SFM), all of which are fully defined herein. While the focus of this paper is on the resultant speech corpora, and the survey of possible objectives, a consequence of the paper is a thorough empirical comparison of the relative merits of these modern submodular optimization procedures. We provide baseline word recognition results on all of the resultant speech corpora for both Gaussian mixture model (GMM) and deep neural network (DNN)-based systems, and we have released all of the corpora definitions and Kaldi training recipes for free in the public domain.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

SVitchboard-II and FiSVer-I: Crafting high quality and low complexity conversational english speech corpora using submodular function optimization

Abstract

Talk to us

Similar Papers

More From: Computer Speech & Language

Lead the way for us

Journal: Computer Speech & Language	Publication Date: Oct 27, 2016
Citations: 2

Similar Papers

Submodular data selection with acoustic and phonetic features for automatic speech recognition
Chongjia Ni ... Li Lu
-
Chongjia Ni, et. al.Chongjia Ni ... Li Lu
01 Apr 2015
01 Apr 2015

Mongolian Speech Recognition Based on Deep Neural Networks
Hui Zhang ... Guanglai Gao
-
Hui Zhang, et. al.Hui Zhang ... Guanglai Gao
01 Jan 2015
01 Jan 2015

UCSY-SC1: A Myanmar speech corpus for automatic speech recognition
Aye Nyein Mon ... Ye Kyaw Thu
-
Aye Nyein Mon, et. al.Aye Nyein Mon ... Ye Kyaw Thu
01 Aug 2019
01 Aug 2019

Efficient Neighborhood Covering Reduction with Submodular Function Optimization
Qiang Chen ... Xiaodong Yue
-
Qiang Chen, et. al.Qiang Chen ... Xiaodong Yue
01 Jan 2017
01 Jan 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

SVitchboard-II and FiSVer-I: Crafting high quality and low complexity conversational english speech corpora using submodular function optimization

Abstract

Talk to us

Similar Papers

More From: Computer Speech &amp; Language

More From: Computer Speech & Language