Abstract

Speech recognition technology is a convenient modality for elderly people with limited dexterity and eyesight since it allows them to operate smartphones and other devices, but speech recognition accuracy declines as people get older. Our corpus of elderly Japanese speech, called EARS (Elderly Adults Read Speech), includes many speakers over 80 years old, and we have used it to create acoustic models which improve recognition accuracy for elderly speech [1]. Although the acoustic features of elderly people’s speech differs from those adults [2, 3], the studies compared the acoustic features of the speech of elderly and super elderly Japanese people are not so many. In this paper, we investigated age-related changes in fundamental frequency (Fo), formant frequencies, laryngeal noise, MFCCs and vowel centralization in elderly to super elderly speakers (60-98 years old), using speech data from the EARS corpus and the S-JNAS corpus of elderly Japanese speech. We found that the speech of both genders exhibited an increase in F1 and decrease in F2 of the vowel/i, some similarities in mean and standard deviation in MFCCs, and a decrease in the mean and standard deviation of shimmer with increasing age. In contrast, different aging-related dynamics were observed for male and female speakers in fundamental frequency, formant frequencies other than those listed above, vowel centralization, most other MFCCs.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call