Abstract

In the task of detecting intrinsic plagiarism, the cases where reference corpus is absent are to be dealt with. This task is entirely based on inconsistencies within a given document. Detection of internal plagiarism has been considered as a classification problem. It can be estimated through taking into consideration self-based information from a given document. The core contribution of the work proposed in this paper is associated with the document representation. Wherein, the document, also, the disjoint segments generated from it, have been represented as weight vectors demonstrating their main content. Where, for each element in these vectors, its average weight has been considered instead of its frequency. The proposed work has been evaluated in terms of Precision, Recall, F-measure, Granularity, and Plagdet. It is shown that the attained results are comparable to the ones attained by the best state-of-the-art methods. Where, through applying the proposed method to PAN-PC-09 and PAN-PC-11 for the detection of intrinsic plagiarism, a Recall scores of 0.4503 and 0.4303 have been recorded, even though further improvement for Precision (0.3308 and 0.2806) and Granularity (1.1765 and 1.1111) needs to be made. Concerning f-measure, the proposed approach has recorded 0.3814 and 0.3397. In terms of the total performance of a plagiarism detection approach, Plagdet, the proposed method has recorded 0.3399 and 0.3151.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.