Semantic-Based Integrated Plagiarism Detection Approach for English Documents

Manpreet Kaur,Vishal Gupta,Ravreet Kaur

doi:10.1080/03772063.2021.2004383

Abstract

The proposed work models a novel plagiarism detection system based on the semantic features to uncover the cases of plagiarism. The system constructs the dynamic relation matrix for each suspicious and source sentence pair to measure the degree of similarity using semantic features. Two Weighted Inverse Distance and GlossDice procedures show several text properties (synonyms, shortest path, etc.) to overcome the limitations of the existing features and new similarity metric for plagiarism detection are presented in this paper. Moreover, this research investigates the independent performance of various features to detect plagiarized cases and combine the best features by assigning different weight contributions to further enhance the system performance. Weighted Inverse Distance integrated with SynJaccard boosts the system performance and shows promising results. Initially, all the experiments were performed on PAN-PC-11dataset, and then PAN-14 text alignment dataset was used to validate the results of the proposed system. The effectiveness of the proposed system has been measured using standard performance measures i.e. Precision, Recall, F-measure, Granularity, and Plagdet score. The proposed system has outperformed the other baseline systems with precision (0.9459), recall (0.8861), f-measure (0.8917), and plagdet (0.8857) on the PAN-PC-11 dataset. For PAN-14 text alignment, the system exhibits precision (0.9257), recall (0.9055), f-measure (0.8931), and plagdet (0.8806).

Full Text