A new speech corpus of super-elderly Japanese for acoustic modeling

Meiko Fukuda,Ryota Nishimura,Hiromitsu Nishizaki,Koharu Horii,Yurie Iribe,Kazumasa Yamamoto,Norihide Kitaoka

doi:10.1016/j.csl.2022.101424

Abstract

The development of accessible speech recognition technology will allow the elderly to more easily access electronically stored information. However, the necessary level of recognition accuracy for elderly speech has not yet been achieved using conventional speech recognition systems, due to the unique features of the speech of elderly people. To address this problem, we have created a new speech corpus named EARS (Elderly Adults Read Speech), consisting of the recorded read speech of 123 super-elderly Japanese people (average age: 83.1), as a resource for training automated speech recognition models for the elderly. In this study, we investigated the acoustic features of super-elderly Japanese speech using our new speech corpus. In comparison to the speech of less elderly Japanese speakers, we observed a slower speech rate and extended vowel duration for both genders, a slight increase in fundamental frequency for males, and a slight decrease in fundamental frequency for females. To demonstrate the efficacy of our corpus, we also conducted speech recognition experiments using two different acoustic models (DNN-HMM and transformer-based), trained with a combination of data from our corpus and speech data from three conventional Japanese speech corpora. When using the DNN-HMM trained with EARS and speech data from existing corpora, the character error rate (CER) was reduced by 7.8% (to just over 9%), compared to a CER of 16.9% when using only the baseline training corpora. We also investigated the effect of training the models with various amounts of EARS data, using a simple data expansion method. The acoustic models were also trained for various numbers of epochs without any modifications. When using the Transformer-based end-to-end speech recognizer, the character error rate was reduced by 3.0% (to 11.4%) by using a doubled EARS corpus with the baseline data for training, compared to a CER of 13.4% when only data from the baseline training corpora were used.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Computer Speech & Language	Publication Date: Jun 24, 2022
Citations: 1	License type: cc-by-nc-nd

R Discovery Prime

R Discovery Prime

A new speech corpus of super-elderly Japanese for acoustic modeling

Abstract

Talk to us

Similar Papers

More From: Computer Speech & Language

Lead the way for us

Similar Papers

A New Corpus of Elderly Japanese Speech for Acoustic Modeling, and a Preliminary Investigation of Dialect-Dependent Speech Recognition
Meiko Fukuda ... Norihide Kitaoka
-
Meiko Fukuda, et. al.Meiko Fukuda ... Norihide Kitaoka
01 Oct 2019
01 Oct 2019

Large vocabulary taiwanese (min-nan) speech recognition using tone features and statistical pronunciation modeling
Dau-Cheng Lyu ... Yuang-Chin Chiang
-
Dau-Cheng Lyu, et. al.Dau-Cheng Lyu ... Yuang-Chin Chiang
01 Sep 2003
01 Sep 2003

Exploring recurrent neural network based acoustic and linguistic modeling for children's speech recognition
Sreeram Ganji ... Rohit Sinha
-
Sreeram Ganji, et. al.Sreeram Ganji ... Rohit Sinha
01 Nov 2017
01 Nov 2017

Bangladeshi Bangla speech corpus for automatic speech recognition research
Shafkat Kibria ... M Zafar Iqbal
Speech Communication | VOL. 136
Shafkat Kibria, et. al.Shafkat Kibria ... M Zafar Iqbal
10 Dec 2021
Speech Communication | VOL. 136

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A new speech corpus of super-elderly Japanese for acoustic modeling

Abstract

Talk to us

Similar Papers

More From: Computer Speech &amp; Language

More From: Computer Speech & Language