Abstract

Determining semantic similarity between documents is crucial to many tasks such as plagiarism detection, automatic technical survey and semantic search. In this paper, we have mainly focused on detecting the semantic similarity between documents in large documents collection and queries based on an Arabic search engine, we investigated MapReduce as a specific framework for managing distributed processing in dataset pattern and semantic similarity measures of documents. Then we study the state of the art of different approaches for computing the similarity of documents. We propose an approach based on parallel algorithm of semantic similarity measures using MapReduce and WordNet after translation phase to detect the relevant documents in the face of the Arabic query. The numerical results obtained and presented showed the efficiency and the performance of the technique adopted.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call