Abstract

Pressed voice (‘‘rikimi’’ in Japanese) is a voice quality related to the vibratory patterns of the vocal folds. It was recently shown that pressed voice carries important paralinguistic information in Japanese, indicating emotional or attitudinal state of the speaker. In the present work, several acoustic features are investigated, aiming for an appropriate acoustic characterization and an automatic detection of pressed voices. Analysis of pressed voice samples extracted from natural conversational speech firstly shows that irregularity in periodicity (such as in creaky and harsh voices) is a common but not a strictly determinant feature of pressed voices. Spectral analysis shows that parameters related to spectral slope are effective to identify part of the pressed voice samples, but fail when vowels are nasalized or double-beating occurs in a glottal cycle. Temporal analyses of pressed and nonpressed creaky voices indicate that diplophonia (simultaneous production of two separate tones, when vocal folds oscillate out of phase) frequently occurs in nonpressed creaky voices, while it rarely appears in pressed ones. Further temporal analysis of EGG (electroglottograph) waveforms of acted speech showed the same trends obtained for natural speech, indicating that information about the absence of diplophonia can potentially be used for pressed voice detection.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call