Abstract

We evaluate existing and modified approaches for measuring the semantic similarity of sentences in the Malay language. These approaches are mainly used for English sentences and no studies to date have evaluated and compared their effectiveness when applied to Malay sentences. We used a pre-processed Malay machine-readable dictionary to calculate word-to-word semantic similarity with two methods: probability of intersection and normalization. We then used the word-to-word semantic similarity measure to identify semantic sentence similarity. We evaluated five measures of semantic sentence similarity: vector-based semantic similarity, word order similarity, highest word-to-sentence similarity, and combinations of vector-based and word-to-sentence similarity and of word order and word-to-sentence similarity. We also evaluated the effects of including and excluding lexical components such as prepositions, conjunctions, verbs, and morphological variants.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.