Abstract

The aim of the study of to design and text mining tool for plagiarism detection in Arabic document. Plagiarism is an act or instance of using or closely imitating the language and thoughts of another author without authorization and the representation of that author's work as one's own, as by not crediting the original author (El-Matarawy <em>et al</em>., 2013). Plagiarism detecting in Arabic language documents are difficult because of the complex linguistic structure of this language. Text mining in data mining has a very good role in the natural language processing. In this study, we present plagiarism detection architecture for comparison of Arabic texts to identify similarities using text-mining methods. A new text-mining algorithm namely Text Mining Algorithm for Plagiarism Detection (TMA-PD) is proposed to generate tokens from the given Arabic document. A new framework, which combines the familiar text mining methods <em>Topic Tracking</em>, <em>Clustering</em> and <em>Concept Linkage</em>, is also proposed in this research study. This tool thus reduces the time in preliminary part in the detection of plagiarism. As the present text mining tools do not have the feature to process Arabic documents, a new add-on will be developed and integrated in it. Software agents are also used for better comparisons and to find out more texts that are similar. The performance of this tool will be evaluated on a large data set of Arabic texts.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.