Speech timing and cross-linguistic studies towards computational human modeling

Yoshinori Sagisaka,Chatchawarn Hansakunbuntheung,Hiroaki Kato,Minoru Tsuzaki,Shizuka Nakamura

doi:10.1109/icsda.2009.5278386

Abstract

In this paper, we introduce Japanese segmental duration characteristics and computational modeling that we have been studying for around three decades in speech synthesis. A series of experimental results are also shown on loudness dependence in the duration perception. These computational duration modeling and perceptual studies on duration error sensitivity to loudness give some insights for computational human modeling of spoken language capability. As a first trial to figure out how these findings could be efficiently employed in other field like language learning, we introduce our current efforts on the objective evaluation of 2nd language speaking skill and the research consortium of AESOP (Asian English Speech cOrpus Project) where researchers in Asian countries have started to work together.

Full Text