Abstract
Abstract Generally, the process of plagiarism detection can be divided into two main stages: source retrieval and text alignment. The paper evaluates and compares effectiveness of five fingerprint selection algorithms used during the source retrieval stage: Every p-th, 0 mod p, Winnowing, Frequency-biased Winnowing (FBW) and Modified FBW (MFBW). The algorithms are evaluated on a dataset containing plagiarism cases in Bachelor and Master Theses written in English in the field of computer science. The best performance is reached by 0 mod p, Winnowing and MFBW. For these algorithms, reduction of fingerprint size from 100 % to about 20 % kept the effectiveness at approximately the same level. Moreover, MFBW sends overall fewer document pairs to the text alignment stage, thus also reducing the computational cost of the process. The software developed for this study is freely available at the author’s website http://www.cs.rtu.lv/jekabsons/.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.