Abstract

In the context of intrinsic plagiarism detection, we are trying to discover plagiarised passages in a text, based on the stylistic changes and inconsistencies within the document itself. The main idea consists in profiling the style of the original author and marking as outliers the passages that seem to differ significantly. Besides some novel stylistic and semantic features, the present work proposes a new approach to the problem, where machine learning plays a significant role. Notably, we also consider, for the first time, the reality of unbalanced training dataset in intrinsic plagiarism detection as a major parameter of the problem. Our detection system is tested on the data corpora of PAN Webis intrinsic plagiarism detection’s shared tasks of 2009 and 2011 and is compared to the results of the highest score participations.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.