Thai Word Segmentation with Hidden Markov Model and Decision Tree

Poramin Bheganan,Richi Nayak,Yue Xu

doi:10.1007/978-3-642-01307-2_10

Abstract

The Thai written language is one of the languages that does not have word boundaries. In order to discover the meaning of the document, all texts must be separated into syllables, words, sentences, and paragraphs. This paper develops a novel method to segment the Thai text by combining a non-dictionary based technique with a dictionary-based technique. This method first applies the Thai language grammar rules to the text for identifying syllables. The hidden Markov model is then used for merging possible syllables into words. The identified words are verified with a lexical dictionary and a decision tree is employed to discover the words unidentified by the lexical dictionary. Documents used in the litigation process of Thai court proceedings have been used in experiments. The results which are segmented words, obtained by the proposed method outperform the results obtained by other existing methods.KeywordsHidden Markov ModelThai Word segmentationDecision tree

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Thai Word Segmentation with Hidden Markov Model and Decision Tree

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Improving Thai Word Segmentation using HMM: A Case Study of Sentiment Analysis
Thapani Hengsanankun ... Atchara Namburi
-
Thapani Hengsanankun, et. al.Thapani Hengsanankun ... Atchara Namburi
03 Dec 2020
03 Dec 2020

Analysing Cooking Behaviour in Home Settings: Towards Health Monitoring.
Kristina Yordanova ... Samuel Whitehouse
Sensors | VOL. 19
Kristina Yordanova, et. al.Kristina Yordanova ... Samuel Whitehouse
04 Feb 2019
Sensors | VOL. 19

A comparative study on Thai word segmentation approaches
Choochart Haruechaiyasak ... Matthew Dailey
-
Choochart Haruechaiyasak, et. al.Choochart Haruechaiyasak ... Matthew Dailey
01 May 2008
01 May 2008

Handling Cross- and Out-of-Domain Samples in Thai Word Segmentation
...
-
, et. al. ...
01 Aug 2021
01 Aug 2021

Publication Date: Jan 1, 2009
Citations: 16	License type: mit

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Thai Word Segmentation with Hidden Markov Model and Decision Tree

Abstract

Talk to us

Similar Papers