Measuring Semantic Similarity of Bengali Texts with Parts-of-Speech Tags and Word-Level Semantics

Md Atabuzzaman,Md Shajalal

doi:10.1109/iccit51783.2020.9392700

Md Atabuzzaman, Md Shajalal

https://doi.org/10.1109/iccit51783.2020.9392700

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

The semantic textual similarity is essential for many applications related to natural language processing. But measuring the semantic similarity is not an easy task. Because there are different types of sentences and the diversities of sentences’ structure make assessing the semantic similarity a formidable task. When two texts are lexicographically dissimilar but semantically similar, the traditional lexical matching cannot return the actual degree of semantic similarity. Besides these, the lack of well-recognized language processing resources for Bengali texts makes the semantic textual similarity calculation a difficult task. In this paper, we tried to measure the semantic similarity of Bengali texts using word-level similarity and parts-of-speech tags. To assess the semantic similarity, we exploit the Bengali parts-of-speech tagger and pre-trained word-embedding model. Then, the maximum word-to-word similarity of the words is employed if the words belong to identical parts-of-speech tag. We also introduced a grammatical role level similarity in our proposed method to measure sentences’ similarity. To validate the performance of our method, we conducted experiments on a publicly available benchmark Bengali dataset. The results of the experiments demonstrated that our proposed method is effective to measure the degree of similarity of Bengali texts and achieved state-of-the-art performance.

Full Text