Markov modeling of Mandarin Chinese for decoding the phonetic sequence into Chinese characters

Hung-Yan Gu,Chiu-Yu Tseng,Lin-Shan Lee

doi:10.1016/0885-2308(91)90004-a

Abstract

In many applications, Chinese information is very often provided in the form of phonetic symbol sequences, and it is desired to decode such sequences into the corresponding Chinese character sequences (sentences) as the output. Phonetic input of Chinese characters into computers is a typical example. The problem is due primarily to the high degree of ambiguities caused by the large number of homonyms in Mandarin Chinese. In this paper, Markov models for Mandarin Chinese are developed to solve effectively the above decoding problem, and an efficient algorithm suitable for parallel processing based on dynamic programming is further proposed to search fully the solution space of exponential size in polynomial time. Extensive experiments were performed and the results show that appropriate models with proper training conditions can effectively solve the above problem, and the techniques developed here are also suitable for real-time applications.

Full Text