Ageing Voices: The Effect of Changes in Voice Parameters on ASR Performance

Ravichander Vipperla,Joe Frankel,Steve Renals

doi:10.1155/2010/525783

Abstract

With ageing, human voices undergo several changes which are typically characterized by increased hoarseness and changes in articulation patterns. In this study, we have examined the effect on Automatic Speech Recognition (ASR) and found that the Word Error Rates (WER) on older voices is 10% absolute higher compared to those of adult voices. Subsequently, we compared several voice source parameters including fundamental frequency, jitter, shimmer, harmonicity, and cepstral peak prominence of adult and older males. Several of these parameters show statistically significant difference for the two groups. However, artificially increasing jitter and shimmer measures do not effect the ASR accuracies significantly. Artificially lowering the fundamental frequency degrades the ASR performance marginally but this drop in performance can be overcome to some extent using Vocal Tract Length Normalisation (VTLN). Overall, we observe that the changes in the voice source parameters do not have a significant impact on ASR performance. Comparison of the likelihood scores of all the phonemes for the two age groups show that there is a systematic mismatch in the acoustic space of the two age groups. Comparison of the phoneme recognition rates show that mid vowels, nasals, and phonemes that depend on the ability to create constrictions with tongue tip for articulation are more affected by ageing than other phonemes.

Highlights

Older people form an important user group for a variety of spoken dialogue systems
In order to obtain the sentence boundaries and speaker turn alignments in each of these one-hour-long audio recordings, forced alignment was performed on each recording using acoustic models trained on 73 hours of meetings data recorded by the International Computer Science Institute (ICSI), 13 hours of meeting corpora from the National Institute of Standards and Technology (NIST) and 10 hours of corpora from Interactive Systems Laboratory (ISL) [20]
In order to understand the effect of reduction in F0 on Automatic Speech Recognition (ASR) performance, we artificially reduce the F0 by 10% and compare the Word Error Rate (WER) of the original waveforms and modified waveforms

Summary

Introduction

Older people form an important user group for a variety of spoken dialogue systems. Systems with speech-based interactions can be useful for older people with mobility restrictions and visual impairment. Loss of elasticity [1], stiffening of the thorax, reduction in respiratory muscle strength [2], and loss in the diaphragm strength [3] are the most significant changes This leads to a reduction in forced expiratory volume and lung pressure in older people, as a result of which there is a decline in the amount of air that moves in and out and the efficiency with which it moves [4, 5]. There have been numerous studies on the effects of ageing on voice, there has been limited work to understand how these changes affect the performance of Automatic Speech Recognition (ASR) systems. Apart from the difference in acoustics, older people appear to differ in linguistic characteristics when interacting with Spoken Dialogue Systems (SDS) [16] They tend to use a lot of words compared to younger adults in their queries and talk to systems as if they were humans [17]. The results have been shown in graphs and the relevant numbers are tabulated in the Appendix

ASR Performance

Voice Parameter Analysis

Phoneme Acoustic Likelihoods and Phoneme Recognition Rates

Findings

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: EURASIP Journal on Audio, Speech, and Music Processing	Publication Date: Jan 1, 2010
Citations: 59	License type: cc-by

R Discovery Prime

R Discovery Prime

Ageing Voices: The Effect of Changes in Voice Parameters on ASR Performance

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Audio, Speech, and Music Processing

Lead the way for us

Similar Papers

Augmented state space acoustic decoding for modeling local variability in speech
Antonio Miguel ... Richard Rose
-
Antonio Miguel, et. al.Antonio Miguel ... Richard Rose
04 Sep 2005
04 Sep 2005

Speaker verification based processing for robust ASR in co-channel speech scenarios
Seyed Omid Sadjadi ... Larry P Heck
-
Seyed Omid Sadjadi, et. al.Seyed Omid Sadjadi ... Larry P Heck
01 Jan 2014
01 Jan 2014

Novel speech processing techniques for robust automatic speech recognition

-

01 Jan 2006
01 Jan 2006

Speaker Adaptation With Limited Data Using Regression-Tree-Based Spectral Peak Alignment
Shizhen Wang ... Abeer Alwan
IEEE Transactions on Audio, Speech and Language Processing | VOL. 15
Shizhen Wang, et. al.Shizhen Wang ... Abeer Alwan
01 Nov 2007
IEEE Transactions on Audio, Speech and Language Processing | VOL. 15

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Ageing Voices: The Effect of Changes in Voice Parameters on ASR Performance

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Audio, Speech, and Music Processing