Plagiarism, which is a significant problem in the academic world worldwide, is particularly challenging to detect in Arabic due to the language's complex structure. Methodology: The ADPDM model and framework for detecting plagiarism in Arabic documents are presented in this dissertation. It is designed to detect plagiarism within academic contexts. By organizing documents logically into paragraphs, sentences, and words, the model seeks to establish a robust system that can identify duplicated content and search for similar documents within identifying corresponding sets. In particular, the study examines preprocessing techniques such as stop word removal, stemming, and rootage processing, followed by content-based methods utilizing fingerprinting and heuristic algorithms that are tailored to Arabic language features. To aid in efficient detection, the BKDR hash function is used for chunk having. To optimize computation time, heuristic algorithms are implemented at different levels of document representation, using metrics such as Longest Common Substring (LCS). To evaluate the ADPDM system, a corpus of 100 documents is utilized, which includes datasets from AraPlagDet and the Decision Support System (DSS). The performance of ADPDM is compared with other plagiarism detection methods using WCopyFind, but the latter has a higher computational speed than ADBDM. Its recall, precision, and F-measure values of 0. 78035, 0. 994264, and 0. 865688 respectively (ADPDM) are particularly notable for its ability to detect plagiarized content in Arabic documents. ADPDM is a successful anti-pluralist solution for Arabic text, even though it requires a longer processing time than WCopyFind.
Read full abstract