Emotional prosody refers to the ways in which the tone of voice can be modulated to convey emotions, feelings, and attitudes. Previous studies have explored the perception of emotional prosody and whether native speakers (L1) have an in-group advantage in recognizing the emotional prosody of their own cultural groups over non-native speakers. However, little is known about whether these findings in non-tonal languages can be generalized to tonal languages. Mandarin Chinese uses the tone of voice to encode word meanings in addition to emotional prosody. This study investigates the perception of emotional prosody in Mandarin Chinese using an emotion judgment task, focusing on the effects of emotion type (e.g., neutral, joy, anger, sadness) and syllable length (e.g., monosyllable, disyllable, trisyllable, and sentence). Three groups were included, consisting of 20 native Chinese speakers (native group), 20 L1-English L2-Chinese learners (second language group), and 20 native English speakers without Chinese learning experience (non-native group). The results revealed that all three groups can identify emotional prosody well above the chance level in Mandarin Chinese words and sentences. Moreover, the native group and the second language (L2) group showed an in-group advantage in recognizing emotional prosody compared to the non-native group, highlighting the impact of linguistic experience in addition to cultural backgrounds on the perception of emotional prosody. Notably, the effects of emotion type and syllable length differed across the three groups in terms of their perception of emotional prosody. The native group had difficulty identifying positive emotional prosody, whereas both the L2 group and the non-native group showed a pattern of improved accuracy as syllable length increased, with an interaction effect with emotion type.