Abstract

Due to the rapid increase of internet-based data, there is urgent need for a robust intelligent documents security mechanism. Although there are many attempts to build a plagiarism detection system in natural language documents, the unlimited variation and different writing styles of each character in Arabic documents make building such systems challenging. Based on its position in a word, the same Arabic letter can be written three different ways, which makes the handwritten character recognition a cumbersome process. This article proposes an intelligent unsupervised model to detect plagiarism in these documents called ASTAP. First, a handwritten Arabic character recognition system is proposed using the Grey Wolf Optimization (GWO) algorithm. Then, a modified Abstract Syntax Tree (AST) is used to match the contents of the Arabic documents to detect any similarity. Compared to the state-of-the-art methods, ASTAP improves the effectiveness of the plagiarism detection in terms of the matched similarity ratio, the precision ratio, and the processing time.

Highlights

  • Due to the rapid increase of internet-based data, there is urgent need for a robust intelligent documents security mechanism

  • Compared to the state-of-the-art methods, ASTAP improves the effectiveness of the plagiarism detection in terms of the matched similarity ratio, the precision ratio, and the processing time

  • The performance of the proposed ASTAP is calculated in different terms, such as the similarity, the

Read more

Summary

RELATED WORKS

Many types of research aim to detect documents similarities. Despite the enormous efforts to discover the similarity of Arabic documents, most of the previous work focused the electronic form of these documents. Plaggie outcomes by configuration parameters wanted, Plaggie looks like JPlag, but it is a Java application stand_alone command_line and has to be set up locally They use tokenization and Greedy String Tiling algorithms for detecting matches between two source code files namely followed by the GST (Greedy_String_Tiling) algorithm. (Borner, et al, 2012) have introduced a system to check the Plagiarism in Arabic documents It uses tokenization, removes stop-words, and convert the words to their roots in the preprocessing phase, after that the words are switched to their synonyms. Radial basis function (RBF) kernel function is selected and used in this paper, as it’s the most sufficient for SVM In different applications such as image and document processing applications (Yuan, et al, 2017) Abstract Syntax Tree (AST) is a similarity detection (Zaher, et al, 2017) algorithm that analyzes similar detection schemes to detect plagiarism efficiently. Initialize: ci:= node i, TMC:= count for all nodes, x:= node value for each ci in {TT} do if 0 < n < TMC ks:= x + ci

11 Endfor
RESULTS AND DISCUSSION
Evaluation Criteria and Datasets
Results and Discussion
CONCLUSION AND FUTURE WORK
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.