An Improved Greedy Search Algorithm for the Development of a Phonetically Rich Speech Corpus

J.-S Zhang,S Nakamura

doi:10.1093/ietisy/e91-d.3.615

Abstract

An efficient way to develop large scale speech corpora is to collect phonetically rich ones that have high coverage of phonetic contextual units. The sentence set, usually called as the minimum set, should have small text size in order to reduce the collection cost. It can be selected by a greedy search algorithm from a large mother text corpus. With the inclusion of more and more phonetic contextual effects, the number of different phonetic contextual units increased dramatically, making the search not a trivial issue. In order to improve the search efficiency, we previously proposed a so-called least-to-most-ordered greedy search based on the conventional algorithms. This paper evaluated these algorithms in order to show their different characteristics. The experimental results showed that the least-to-most-ordered methods successfully achieved smaller objective sets at significantly less computation time, when compared with the conventional ones. This algorithm has already been applied to the development a number of speech corpora, including a large scale phonetically rich Chinese speech corpus ATRPTH which played an important role in developing our multi-language translation system.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEICE Transactions on Information and Systems	Publication Date: Mar 1, 2008
Citations: 5	License type: free

R Discovery Prime

R Discovery Prime

An Improved Greedy Search Algorithm for the Development of a Phonetically Rich Speech Corpus

Abstract

Talk to us

Similar Papers

More From: IEICE Transactions on Information and Systems

Lead the way for us

Similar Papers

An evaluation of sentence selection methods on the different phone-sized units for constructing Indonesian speech corpus
Muljono ... Catur Supriyanto
International Journal of Speech Technology | VOL. 23
Muljono, et. al. Muljono ... Catur Supriyanto
23 Dec 2019
International Journal of Speech Technology | VOL. 23

Proactive 2D model-based scan planning for existing buildings
Meida Chen ... Eyuphan Koc
Automation in Construction | VOL. 93
Meida Chen, et. al.Meida Chen ... Eyuphan Koc
19 May 2018
Automation in Construction | VOL. 93

Modified Least-to-Most Greedy Algorithm to Search a Minimum Sentence Set
Suyanto
-
Suyanto Suyanto
01 Jan 2006
01 Jan 2006

Case Study of Production Planning Optimization with Use of the Greedy and Tabu Search Algorithms
Łukasz Łampika ... Anna Burduk
-
Łukasz Łampika, et. al.Łukasz Łampika ... Anna Burduk
01 Aug 2018
01 Aug 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Improved Greedy Search Algorithm for the Development of a Phonetically Rich Speech Corpus

Abstract

Talk to us

Similar Papers

More From: IEICE Transactions on Information and Systems