Semantic Textual Similarity in Bengali Text

Md Shajalal,Masaki Aono

doi:10.1109/icbslp.2018.8554940

Abstract

Measuring the textual similarity is indispensable in many information retrieval applications. Researchers proposed numerous similarity measures to compute the semantic similarity between texts for monolingual and multilingual texts. But methods for measuring similarity for Bengali text segments are not so commonly available. In this paper, we propose an approach to estimate the semantic similarity between Bengali text segments. The similarity score is computed with the help of word-level semantics from a pre-trained word-embedding model trained on Bengali Wikipedia texts. In this regard, we employ an algorithm to measure the semantic similarity of Bengali texts. To test the performance of our method, we conducted experiments on a dataset for semantic textural similarity for Bengali texts. We prepare the dataset using the same approach as SemEval applied in the STS 2017. The experimental results in terms of Pearson correlation coefficient conclude that our method achieves a state-of-the-art performance for semantic textual similarity in Bengali texts.

Full Text