A Hybrid Approach to Vietnamese Word Segmentation Using Part of Speech Tags

Dang Duc Pham,Son Bao Pham,Giang Tran

doi:10.1109/kse.2009.44

A Hybrid Approach to Vietnamese Word Segmentation Using Part of Speech Tags

Dang Duc Pham, Son Bao Pham + Show 1 more

https://doi.org/10.1109/kse.2009.44

Copy DOI

Publication Date: Oct 1, 2009

Citations: 35

Affiliation: Vietnam National University, Hanoi

#Vietnamese Word Segmentation #Part Of Speech Tags + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

Word segmentation is one of the most important tasks in NLP. This task, within Vietnamese language and its own features, faces some challenges, especially in words boundary determination. To tackle the task of Vietnamese word segmentation, in this paper, we propose the WS4VN system that uses a new approach based on Maximum matching algorithm combining with stochastic models using part-of-speech information. The approach can resolve word ambiguity and choose the best segmentation for each input sentence. Our system gives a promising result with an F-measure of 97%, higher than the results of existing publicly available Vietnamese word segmentation systems.

Full Text