An evaluation of sentence selection methods on the different phone-sized units for constructing Indonesian speech corpus

Muljono Muljono,Nurul Anisa Sri Winarsih,Agus Harjoko,Catur Supriyanto

doi:10.1007/s10772-019-09662-1

Abstract

Collecting phonetically balanced text corpus is an important step to develop automatic speech recognition and text-to-speech systems. A corpus should have a small number of sentences but contains all phonetic units, such as monophone, triphone, and pentaphone units. There are exist least-to-most greedy algorithm (LTM + Greedy) and its variant to select the minimum sentence set. The variant is on the sentence scoring method, which affect the number of selected sentences. In this paper, we evaluate the sentence scoring methods by Zhang and Suyanto on LTM + Greedy algorithm. The sentence scoring methods are conducted on triphone and pentaphone units on the collection of sentence set. Triphone and pentaphone units have offered higher quality synthesized speech than monophone unit. The dataset of this paper is Indonesian sentences that collected from holy book translation, news, novel, dialog, monologue, and question sentences. Totally 115,489 sentences are used for the experiments. Based on the experiments, LTM + Greedy by Suyanto produces a smaller number of sentences that contain large number of phone units.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An evaluation of sentence selection methods on the different phone-sized units for constructing Indonesian speech corpus

Abstract

Talk to us

Similar Papers

More From: International Journal of Speech Technology

Lead the way for us

Journal: International Journal of Speech Technology	Publication Date: Dec 23, 2019
Citations: 1

Similar Papers

An Improved Greedy Search Algorithm for the Development of a Phonetically Rich Speech Corpus
J.-S Zhang ... S Nakamura
IEICE Transactions on Information and Systems | VOL. E91-D
J.-S Zhang, et. al.J.-S Zhang ... S Nakamura
01 Mar 2008
IEICE Transactions on Information and Systems | VOL. E91-D

Modified Least-to-Most Greedy Algorithm to Search a Minimum Sentence Set
Suyanto
-
Suyanto Suyanto
01 Jan 2006
01 Jan 2006

QHAN: Quantum-inspired Hierarchical Attention Mechanism Network for Question Answering
Peng Guo ... Panpan Wang
International Journal on Artificial Intelligence Tools | VOL. 32
Peng Guo, et. al.Peng Guo ... Panpan Wang
01 Aug 2023
International Journal on Artificial Intelligence Tools | VOL. 32

A submodular optimization approach to sentence set selection
Yusuke Shinohara
-
Yusuke ShinoharaYusuke Shinohara
01 May 2014
01 May 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An evaluation of sentence selection methods on the different phone-sized units for constructing Indonesian speech corpus

Abstract

Talk to us

Similar Papers

More From: International Journal of Speech Technology