A quantitative model of talker normalization by native and non-native speakers

Si Chen,Wing Kam Fung,Puiyin Lau,Caicai Zhang,Yike Yang,Bei Li

doi:10.1121/1.5136843

Abstract

Talker variability affects native and non-native speakers in speech perception of segmentals (e.g., Bent et al., 2010) and suprasegmentals (e.g., Wong and Diehl, 2003). Zhang and Chen (2016) reported that gender-specific F0 range may contribute significantly to Cantonese tone perception. However, a full understanding of how the population F0 distribution affects tone identification is missing. This study aims to bridge this gap by modelling tone distributions and testing how deviated distribution parameters affect tone identification by native and non-native speakers. Statistical modeling of a Cantonese speech corpus of 68 speakers showed that F0 values of three Cantonese tones follow skew-normal distributions with three parameters: location, shape, and scale. We proceeded to conduct two experiments with 28 Cantonese and 28 Mandarin listeners using both naturally produced tones by 34 Cantonese speakers and manipulated tones with F0 values generated from simulated distributions. A multinomial mixed effects model revealed significant main effects of location and shape parameters. Locally weighted scatterplot smoothing curves also differed dramatically between native and non-native listeners, indicating an effect from long-term F0 distribution representations on tonal identification. The results thus offer useful insights about how parametric representations of phonetic distributions is used in tone identification.Talker variability affects native and non-native speakers in speech perception of segmentals (e.g., Bent et al., 2010) and suprasegmentals (e.g., Wong and Diehl, 2003). Zhang and Chen (2016) reported that gender-specific F0 range may contribute significantly to Cantonese tone perception. However, a full understanding of how the population F0 distribution affects tone identification is missing. This study aims to bridge this gap by modelling tone distributions and testing how deviated distribution parameters affect tone identification by native and non-native speakers. Statistical modeling of a Cantonese speech corpus of 68 speakers showed that F0 values of three Cantonese tones follow skew-normal distributions with three parameters: location, shape, and scale. We proceeded to conduct two experiments with 28 Cantonese and 28 Mandarin listeners using both naturally produced tones by 34 Cantonese speakers and manipulated tones with F0 values generated from simulated distributions. A multinomial mixed effects...

Full Text