Abstract

Abstract A large vocabulary Taiwanese (Min-nan) speech recognition system is described in this paper. Due to the severe multiple pronunciation phenomenon in Taiwanese partly caused by tone sandhi, a statistical pronunciation modeling technique based on tonal features is used. This system is speaker independent. It was trained by a bi-lingual Mandarin/Taiwanese speech corpus to alleviate the lack of pure Taiwanese speech corpus. The searching network is constructed based on nodes of Chinese characters and results in the direct output Chinese character string. Experiments show that by using the approaches proposed in this paper, the character error rate can decrease significantly from 21.50% to 11.97%. 1. Introduction For the past decades, Mandarin has been the most widely studied Chinese dialects in speech recognition community due to its huge spoken population. This situation is easy to understand because Mandarin is the official language in major Chinese societies, including Mainland China and Taiwan. However, several important regional dialects other than Mandarin are still widely used in daily lives in all Chinese societies. These dialects, unlike dialects in any one of the other western languages, are mutually unintelligible to each other. In some linguistic viewpoints, these dialects can be even looked upon as different languages just as those in Europe. In Taiwan, the most widely used dialect (or language) second to Mandarin is Taiwanese. It is the mother tongue of more than 75% of the population in Taiwan. It belongs to a larger Chinese dialectical family called Min-nan ( or Southern-Min, Southern-Hokkian), which is also used by many overseas Chinese living in Singapore, Malaysia, Philippine, and other areas of Southern-East Asia. It was estimated that this language has more than 49 millions speakers and is ranked in the 21th place in the world [10]. In this paper, we are concerned about constructing a speech recognition system for Taiwanese. According to some linguists, the distinction between Mandarin and Taiwanese is much more than that between usual dialects in the same language. They can actually be looked upon as two different languages, just like French and English, from some linguistic viewpoints. However, we can still find much similarity between Mandarin and Taiwanese in phonetic, lexical or even syntactic level. Due to this similarity, it is natural for us to utilize speech or text corpus from both languages to help construct a speech recognizer for the Taiwanese speech. Taiwanese, like Mandarin, is a tonal language. The pitch information, which is usually ignored in western languages, is very significant to help understand the meaning and discriminate homonyms. In this paper, we try to use the pitch information as another feature in addition to the widely used mel-frequency cepstrum (MFCC). Another important issue in Taiwanese speech recognition is the severe problems of multiple pronunciations of each morpheme. Here a morpheme maybe a Chinese character (usually called hanzi in China, or kanji in Japan). A statistical pronunciation modeling technique was shown to be very helpful to conquer the issue of multiple pronunciations. This paper is organized as follows: in section 2 we introduce background knowledge including Taiwanese phonetics/linguistics knowledge essential to speech recognition, and the speech corpus that was used to train acoustic models. In section 3, we build a baseline system by using a sub-syllabic CHMM modeling approach. Then we focus on issues of pitch tracking and multiple pronunciations in section 4. Finally, experimental results and conclusion are presented in section 5 and section 6.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call