Semantic Similarity Measures for Malay Sentences

Shahrul Azman Noah,Nazlia Omar,Amru Yusrin Amruddin

doi:10.1007/978-3-540-77094-7_19

Shahrul Azman Noah, Nazlia Omar + Show 1 more

Open Access

PDF Available

https://doi.org/10.1007/978-3-540-77094-7_19

Copy DOI

Export

Save

Cite

Publication Date: Dec 10, 2007

Citations: 13

Affiliation: National University of Malaysia

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

The concept of semantic similarity is an important element in many applications such as information extraction, information retrieval, document clustering and ontology learning. Most of the previous works regarding semantic similarity measures have been traditionally defined between words or concepts (i.e. word-to-word similarity), thus ignoring the text or sentence that the concepts participate. Semantic text similarity was made possible with the availability of resources in the form of semantic lexicon such as the WordNet for English and GermaNet for German. However, for languages such as Malay, text similarity proved to be difficult due to the unavailability of similar resources. This paper, however, describe our approach for text similarity in Malay language. We used a preprocessed Malay dictionary and the overlap edge counting based method to first calculate the word-to-word semantic similarity. The word-to-word semantic similarity measure is then used to identify the semantic sentence similarity using a modified approach for English language. Results of the experiments are very encouraging, and indicate the potential of semantic similarity measure for Malay sentences.

Full Text