Abstract

With the easy access to the huge volume of articles available on the Internet, plagiarism is getting worse and worse. Most recent approaches proposed to address this problem usually focus on achieving better accuracy of similarity detection process. However, there are some real applications where plagiarized contents should be detected without revealing any information. Moreover, in such web-based applications, running time, memory consumption, communication and computational complexity should be also taken into account. In this paper, we propose a similar document detection system based on matrix Bloom filter, a new extension of standard Bloom filter. The experimental results on a real dataset show that the system can achieve 98% of accuracy. We also compare our approach with a method recently proposed for the same purpose. The results of the comparison show that the Bloom filter-based approach achieves much better performance than other in terms of the aforementioned factors.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call