An Improved Outlier Detection Model for Detecting Intrinsic Plagiarism

Nasreen J Kadhim,Maysaa I Abdulhussain Almulla Khalaf

doi:10.24996/ijs.2022.63.12.42

Abstract

In the task of detecting intrinsic plagiarism, the cases where reference corpus is absent are to be dealt with. This task is entirely based on inconsistencies within a given document. Detection of internal plagiarism has been considered as a classification problem. It can be estimated through taking into consideration self-based information from a given document. The core contribution of the work proposed in this paper is associated with the document representation. Wherein, the document, also, the disjoint segments generated from it, have been represented as weight vectors demonstrating their main content. Where, for each element in these vectors, its average weight has been considered instead of its frequency. The proposed work has been evaluated in terms of Precision, Recall, F-measure, Granularity, and Plagdet. It is shown that the attained results are comparable to the ones attained by the best state-of-the-art methods. Where, through applying the proposed method to PAN-PC-09 and PAN-PC-11 for the detection of intrinsic plagiarism, a Recall scores of 0.4503 and 0.4303 have been recorded, even though further improvement for Precision (0.3308 and 0.2806) and Granularity (1.1765 and 1.1111) needs to be made. Concerning f-measure, the proposed approach has recorded 0.3814 and 0.3397. In terms of the total performance of a plagiarism detection approach, Plagdet, the proposed method has recorded 0.3399 and 0.3151.

Full Text