Evaluation of Fingerprint Selection Algorithms for Local Text Reuse Detection

Gints Jēkabsons

doi:10.2478/acss-2020-0002

Abstract

Abstract Detection of local text reuse is central to a variety of applications, including plagiarism detection, origin detection, and information flow analysis. This paper evaluates and compares effectiveness of fingerprint selection algorithms for the source retrieval stage of local text reuse detection. In total, six algorithms are compared – Every p-th, 0 mod p, Winnowing, Hailstorm, Frequency-biased Winnowing (FBW), as well as the proposed modified version of FBW (MFBW). Most of the previously published studies in local text reuse detection are based on datasets having either artificially generated, long-sized, or unobfuscated text reuse. In this study, to evaluate performance of the algorithms, a new dataset has been built containing real text reuse cases from Bachelor and Master Theses (written in English in the field of computer science) where about half of the cases involve less than 1 % of document text while about two-thirds of the cases involve paraphrasing. In the performed experiments, the overall best detection quality is reached by Winnowing, 0 mod p, and MFBW. The proposed MFBW algorithm is a considerable improvement over FBW and becomes one of the best performing algorithms. The software developed for this study is freely available at the author’s website http://www.cs.rtu.lv/jekabsons/.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Computer Systems	Publication Date: May 1, 2020
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Evaluation of Fingerprint Selection Algorithms for Local Text Reuse Detection

Abstract

Talk to us

Similar Papers

More From: Applied Computer Systems

Lead the way for us

Similar Papers

Shenoute, Besa and the Bible
Miyagawa So
-
Miyagawa SoMiyagawa So
28 Feb 2022
28 Feb 2022

Mono- and cross-lingual paraphrased text reuse and extrinsic plagiarism detection

-

24 Jun 2020
24 Jun 2020

Evaluation of Fingerprint Selection Algorithms for Two-Stage Plagiarism Detection
Gints Jēkabsons
Applied Computer Systems | VOL. 26
Gints JēkabsonsGints Jēkabsons
01 Dec 2021
Applied Computer Systems | VOL. 26

Towards a Historical Text Re-use Detection
Marco Büchler ... Emily Franzini
-
Marco Büchler, et. al.Marco Büchler ... Emily Franzini
01 Jan 2014
01 Jan 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Evaluation of Fingerprint Selection Algorithms for Local Text Reuse Detection

Abstract

Talk to us

Similar Papers

More From: Applied Computer Systems