Abstract
The semantic textual similarity is essential for many applications related to natural language processing. But measuring the semantic similarity is not an easy task. Because there are different types of sentences and the diversities of sentences’ structure make assessing the semantic similarity a formidable task. When two texts are lexicographically dissimilar but semantically similar, the traditional lexical matching cannot return the actual degree of semantic similarity. Besides these, the lack of well-recognized language processing resources for Bengali texts makes the semantic textual similarity calculation a difficult task. In this paper, we tried to measure the semantic similarity of Bengali texts using word-level similarity and parts-of-speech tags. To assess the semantic similarity, we exploit the Bengali parts-of-speech tagger and pre-trained word-embedding model. Then, the maximum word-to-word similarity of the words is employed if the words belong to identical parts-of-speech tag. We also introduced a grammatical role level similarity in our proposed method to measure sentences’ similarity. To validate the performance of our method, we conducted experiments on a publicly available benchmark Bengali dataset. The results of the experiments demonstrated that our proposed method is effective to measure the degree of similarity of Bengali texts and achieved state-of-the-art performance.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have