This study conducted an acoustic-prosodic mapping analysis of emotional prosody in Mandarin Chinese. It utilized a validated audiometry corpus with 450 disyllabic words. The spoken words covered five basic emotions produced by a female speaker: angry, sad, happy, fearful, and neutral. A machine-learning approach was adopted to map key acoustic-prosodic features for Mandarin emotional vocalization. The results revealed distinctive acoustic profiles for each emotion, highlighting variations in fundamental frequency, intensity, speaking rate, and voice quality. Emotional utterances consistently exhibited higher mean F0 values than neutral expressions. Fear displayed the highest crest in F0. Angry and happy utterances showed greater vocal intensity and a faster speaking rate compared to fearful and sad expressions. While anger was associated with a creaky voice quality, sadness corresponded with a breathier voice quality. The current findings are limited with the use of the single-speaker corpus. Ongoing efforts aim to expand the corpus with more speakers to test the generalizability and scalability of the analysis approach for subsequent investigations.
Read full abstract