Khmer Treebank Construction via Interactive Tree Visualization

Bonpagna Kann,Thodsaporn Chay-Intr,Hour Kaing,Thanaruk Theeramunkong

doi:10.22146/ijitee.48545

Abstract

Despite the fact that there are a number of researches working on Khmer Language in the field of Natural Language Processing along with some resources regarding words segmentation and POS Tagging, we still lack of high-level resources regarding syntax, Treebanks and grammars, for example. This paper illustrates the semi-automatic framework of constructing Khmer Treebank and the extraction of the Khmer grammar rules from a set of sentences taken from the Khmer grammar books. Initially, these sentences will be manually annotated and processed to generate a number of grammar rules with their probabilities once the Treebank is obtained. In our experiments, the annotated trees and the extracted grammar rules are analyzed in both quantitative and qualitative way. Finally, the results will be evaluated in three evaluation processes including Self-Consistency, 5-Fold Cross-Validation, Leave-One-Out Cross-Validation along with the three validation methods such as Precision, Recall, F1-Measure. According to the result of the three validations, Self-Consistency has shown the best result with more than 92%, followed by the Leave-One-Out Cross-Validation and 5-Fold Cross Validation with the average of 88% and 75% respectively. On the other hand, the crossing bracket data shows that Leave-One-Out Cross Validation holds the highest average with 96% while the other two are 85% and 89%, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Khmer Treebank Construction via Interactive Tree Visualization

Abstract

Talk to us

Similar Papers

More From: IJITEE (International Journal of Information Technology and Electrical Engineering)

Lead the way for us

Journal: IJITEE (International Journal of Information Technology and Electrical Engineering)	Publication Date: Dec 11, 2019
License type: CC BY-NC-ND 4.0

Similar Papers

Part of Speech Tagger for Marathi Language
Sharvari Govilkar ... Bakal J. W
International Journal of Computer Applications | VOL. 119
Sharvari Govilkar, et. al.Sharvari Govilkar ... Bakal J. W
18 Jun 2015
International Journal of Computer Applications | VOL. 119

An approach to reduce part of speech ambiguity using semantically annotated lexicon definitions
Andrei Minca ... Stefan Diaconescu
-
Andrei Minca, et. al.Andrei Minca ... Stefan Diaconescu
01 Sep 2012
01 Sep 2012

An Approach to Reduce Part of Speech Ambiguity Using Semantically Annotated Lexicon Definitions
Andrei Minc ... Tefan Diaconescu
-
Andrei Minc, et. al.Andrei Minc ... Tefan Diaconescu
01 Jan 2013
01 Jan 2013

Towards POS Tagging Methods for Bengali Language: A Comparative Analysis
Fatima Jahara ... Iqbal H Sarker
-
Fatima Jahara, et. al.Fatima Jahara ... Iqbal H Sarker
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Khmer Treebank Construction via Interactive Tree Visualization

Abstract

Talk to us

Similar Papers

More From: IJITEE (International Journal of Information Technology and Electrical Engineering)