Native listeners’ evaluation of natural and synthesized prosody in Mandarin of American learners

Ying Chen,Li Liu,Xueqin Zhao

doi:10.1121/1.4970663

Abstract

Compared to duration and intensity, F0 was found the most difficult acoustic parameter to acquire in L2 prosody, especially post-focus compression of F0 (Chen, 2016). This study examined three groups’ Mandarin production of prosodic focus: native Beijing speakers, early American learners and late American learners. PENTAtrainer2 (Xu & Prom-on, 2014), a data-driven system for prosody analysis and synthesis, was used to model and synthesize F0 contours based on speaker groups and layered annotations of communicative functions: lexical, sentential and focal. Native Mandarin speakers were recruited to identify focus status (neutral, initial, medial, or final focus) and rate the naturalness (1-5 scale) of original and synthesized speech. Results reveal that natural speech was recognized and rated better than synthesized speech, early learners’ speech better than late learners’ speech, focused sentences better than no-focus sentences, and initial focus and medial focus better than final focus. Tones of focused words interacted with focus status of the sentence and speaker group. Future work will involve pairwise shape comparisons, root-mean-square error, and Pearson’s correlation coefficient comparing between natural and synthesized F0 contours. [This work was supported by the National Science Foundation of China 61573187 and Fundamental Research Funds for the Central Universities in China NJUSTWGY14001.]

Full Text