Abstract
A rapid growth has occurred for the act of plagiarism with the aid of Internet explosive growth wherein a massive volume of information offered with effortless use and access makes plagiarism the process of taking someone else’s work (represented by ideas, or even words) and representing it as other's own work easy to be performed. For ensuring originality, detecting plagiarism has been massively necessitated in various areas so that the people who aim to plagiarize ought to offer considerable effort for introducing works centered on their research.
 In this paper, work has been proposed for improving the detection of textual plagiarism through proposing a model for candidate retrieval phase. The model proposed for retrieving candidates has adopted the vector space method VSM as a retrieval model and centered on representing documents as vectors consisting of average term weights and considering them as queries for retrieval instead of representing them as vectors of term weight. The detailed comparison task comes as the second phase wherein fuzzy semantic based string similarity has been applied. Experiments have been conducted using PAN-PC-10 as an evaluation dataset for evaluating the proposed system. As the problem statement in this paper is restricted to detect extrinsic plagiarism and works on English documents, experiments have been performed on the portion dedicated to extrinsic detection and on documents in English language only. For evaluating performance of the proposed model for retrieving candidates, Precision, Recall, and F-measure have been used as an evaluation metrics. The overall performance of the proposed system has been assessed through the use of the ï¬ve standard PAN measures Precision, Recall, F-measure, Granularity and . The experimental results have clarified that the proposed model for retrieving candidates has a positive impact on the overall performance of the system and the system outperforms the other state-of-the-art methods. They clarified that the proposed model has detected about 80% of the plagiarism cases and about 90% of the detections were correct. The proposed model has the ability to detect literal plagiarism in addition to cases containing paraphrasing. Performance comparison has clarified that the proposed system is either comparable or outperforms the other baseline systems in terms of the five evaluation metrics.
Highlights
With Internet explosive growth, the massive volume of information offered with effortless use and access makes the process of taking someone else’s work and representing it as other’s own work easy to be performed
Plagiarism detection (PD) is one application of Natural Language Processing (NLP) that is connected with methods from associated fields, such as and soft computing (SC), data mining (DM), and information retrieval (IR)
Based on the commonly used VSM retrieval model, a model for retrieving candidates and necessitated for the detailed comparison stage has been proposed. This proposed retrieval model that represents documents as vectors constituting average weights of their terms instead of term weights and measuring the similarity between the centers of the documents has improved the performance of retrieval problem and the overall performance of the plagiarism detection system
Summary
With Internet explosive growth, the massive volume of information offered with effortless use and access makes the process of taking someone else’s work and representing it as other’s own work easy to be performed. Plagiarism is defined as reusing someone else’s work (represented by ideas, or even words) without citing the source [1]. At the present time, detecting plagiarism is massively necessitated in various areas for ensuring text, materials, and resources originality. Plagiarism detection tool can have crucial role for preventing people aiming to perform intentional plagiarism so that they should offer considerable effort for contributing novel thoughts or even techniques to the academic world centered on their research [2]. Plagiarism detection (PD) is one application of Natural Language Processing (NLP) that is connected with methods from associated fields, such as and soft computing (SC), data mining (DM), and information retrieval (IR). Discovering illegal copying of text patterns from other sources is the focus of PD research [3]
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have