Abstract

In this work, we show our rule based technique to detect automatically the bi-gram compound word from the Malay standard document. Our scope for a compound word that has been detected in this work is a bi-gram compound for Noun Noun, Noun Adjective and Noun Verb combination. We identified some limitations on detection of Malay compound word with the existing methods that correspond to a structure of Malay sentences. Before the process of detection compound word was done, preprocessing task was applied to produce the list of compound word candidate. During the process of detecting compound word, we used dictionary-based and thesaurus information for applying Part of Speech (POS) tagging to tag for all the words in the selected Malay document. Then, after the tagging process, we modified several existing identification rule-based according to Malay grammar rules and the pattern of the sentences to increase the percentage of recall, precision and F1-Score. All the evaluation values were compared with the previous work. Testing was done on 3124 sentences taken from Utusan Melayu news. The result in average showed an improvement compared to previous research with precision of 93.8%, a recall of 31.1% and a F1-Score of 43.8%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call