Formosa Speech in the Wild Corpus for Improving Taiwanese Mandarin Speech-Enabled Human-Computer Interaction

Yuan-Fu Liao,Jozef Juhar,Wu-Hua Hsu,Yung-Hsiang Shawn Chang,Matus Pleva,Yu-Chen Lin

doi:10.1007/s11265-019-01483-4

Abstract

Mandarin in Taiwan is notably different from other variants of Mandarin in terms of lexical use and accents. However, from an investment perspective, it remains debated whether the general-purpose Mandarin speech recognition (MSR) systems are sufficient for supporting human-computer interaction in Taiwan. In addressing this question, we established the Formosa (an ancient name of Taiwan given by the Portuguese) Speech in the Wild (FSW) (Liao 2018) project to (1) collect large-scale Taiwanese Mandarin speech to boost Taiwanese-specific MSR technique development, and (2) host a Formosa Speech Recognition (FSR) challenge (Liao 2018) to promote the corpus as well as to evaluate the performance of the available Taiwanese-specific MSR systems. The FSW project has focused on transcribing spontaneous Taiwanese Mandarin speech selected from real-life, multi-genre broadcast radio speech provided by Taiwan’s National Education Radio (2018). We plan to publicly release about 3000 hours of speech data at the end of 2019. FSR-2018 (Liao 2018) was the culmination of FSW’s events in the year 2018, which featured a Taiwanese broadcast Mandarin speech recognition evaluation campaign using released corpora. The challenge was also an official activity (Liao 2018) of the 11th International Symposium on Chinese Spoken Language Processing (ISCSLP) [22]. At the end of 2018, the first 4 volumes of the FSW Corpus, NER-Trs-Vol1∼4, a total of 610.2 hours of speech data, were released to support two events, Formosa Grand Challenge, Talk to AI (FGC) (Ministry of Science And Technology Taiwan 2018) (Dec. 2017 ∼ Mar. 2019) and FSR-2018 challenge (Liao 2018) (June 2018 ∼ Nov. 2018), which had 147 and 27 participating teams respectively. For FSR-2018, 30 recognition results on the final-test set were submitted by 16 teams. The evaluation results revealed that the best Taiwanese-specific MSR system achieved an 8.1% Chinese character error rate (CER). As reference, the performances of iFlyTek’s (ISCSLP 2018) and Google’s (2018) commercial MSR systems which were not optimized for this task were 18.8% and 20.6% CERs, respectively. Taken together, we argued that a Taiwanese-specific MSR system is necessary for improving the performance of Taiwanese Mandarin speech-enabled human-computer interaction.

Full Text