Arabic Sentences Semantic Similarity Based on Word Embedding

Badrya Dahy,Khaled Fathy,Mamdouh Farouk

doi:10.1109/esolec54569.2022.10009099

Abstract

Natural language processing pays significant attention to semantic textual similarity. It's useful in a variety of NLP-applications, including information retrieval, plagiarism detection, data extraction, and machine translation. Sentence similarity in the Arabic language has not been investigated deeply because of the lack of Arabic language resources. Moreover, it's critical to calculate the degree of similarity between Arabic sentences accurately. The method for determining the semantic similarity of Arabic sentences is suggested in this research. The strategy suggested uses word embedding to measure the similarity between words. Moreover, more than one similarity measure is combined to calculate the final similarity. Furthermore, due to the lack of Arabic resources, a new dataset for evaluating similarity techniques has been constructed. The new dataset is available for public use. An experiment have been conducted to show the efficiency of the strategy suggested. Two datasets are used to compare other approaches. Experiments reveal that the proposed methods outperform alternative approaches to measuring sentence similarity in the Arabic language.

Full Text