Abstract

Automatic word segmentation of Vietnamese is the primary step in Vietnamese text information processing, which would be an important support for cross-language information processing tasks in China and Vietnam. Since the Vietnamese language is an isolating language with tones, each syllable can not only form a word individually, but also create a new word by combining with left and/or right syllables. Therefore, automatic word segmentation of Vietnamese cannot be simply based on spaces. This paper takes automatic word segmentation of the Vietnamese language as the research object. First, it makes a rough segmentation of Vietnamese sentences with the N-shortest path model. Then, syllables in each sentence are abstracted into a directed acyclic graph. Finally, the Vietnamese word segmentation is obtained by calculating the shortest path with the help of the BEMS marking system. The results show that the proposed algorithm achieves a satisfactory performance in Vietnamese word segmentation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call