Abstract

Semantic similarity is applied for many areas in Natural Language Processing, such as information retrieval, text classification, plagiarism detection, and others. Many researchers used semantic similarity for English texts, but few used for Arabic due to the ambiguity of Arabic concepts in both sense and morphology. Therefore, the first contribution in this paper is developing a semantic similarity approach between Arabic sentences. Nowadays, the world faces a global problem of coronavirus disease. In light of these circumstances and distancing's imposition, it is difficult for farmers to physically communicate with agricultural experts to provide advice and find suitable solutions for their agricultural complaints. In addition, traditional practices still are used by most farmers. Thus, our second contribution is helping the farmers solve their Arabic agricultural complaints using our proposed approach. The Latent Semantic Analysis approach is applied to retrieve the most problem-related semantic to a farmer's complaint and find the related solution for the farmer. Two methods are used in this approach as a weighting schema for data representation are Term Frequency and Term Frequency-Inverse Document Frequency. The proposed model has also classified the big agricultural dataset and the submitted farmer complaint according to the crop type using MapReduce Support Vector Machine to improve the performance of semantic similarity results. The proposed approach's performance with Term Frequency-Inverse Document Frequency-based Latent Semantic Analysis achieved better than its counterparts with an F-measure of 86.7%.

Highlights

  • The semantic analysis field has an essential role in the research related to text analytics

  • By comparing the evaluation results of the TF-based latent semantic analysis (LSA) approach with the TF_IDF-based LSA approach, we conclude that the results of TF_IDF-based LSA approach achieved the best results since TF-Inverse Document Frequency (IDF) measures how important a term in complaints that give high weight for important terms while TF shows the only number of times that a term appears in a complaint

  • A semantic similarity approach for agriculture farmers' complaints is developed to solve these issues. This approach is based on LSA to measure semantic similarity between farmer query and the complaints document

Read more

Summary

Introduction

The semantic analysis field has an essential role in the research related to text analytics. Semantic similarity is used for several fields in NLP like information retrieval, text summarization, plagiarism detection, question answering, document clustering, text classification, machine translation, and others [5], [6]. It is defined as determining whether two concepts are similar in meaning or not [7]. Concepts are semantic if words depend on information acquired from massive corpora, even if they have a different lexical structure. The Arabic language considers the fifth most spoken language in the world. This paper will apply a semantic similarity approach to the Arabic dataset

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call