Searching source code fragments using incremental clustering

Michal Ďuračík,Emil Kršák,Patrik Hrkút

doi:10.1002/cpe.5416

Abstract

SummaryPlagiarism is becoming an increasingly serious problem in academic environment. In this paper, we deal with a specific kind of plagiarism: source code plagiarism. In this case, there is no software available for detecting plagiarism on a larger scale (hundreds of student submissions every year). We propose algorithms for source code parsing and processing as a part of a complex system for plagiarism detection. A source code vectorization using characteristic vectors is a vital piece of the whole process, and k‐means algorithm helps with the classification and clustering of vectors. Student assignments are submitted regularly, and any plagiarism detection system needs to handle them as they come. For this reason, we propose a modified incremental k‐means algorithm and a method for determining the number of clusters. We also consider methods for vector search among clusters and suggest the use of conditional entropy to select the important vector elements used in the search algorithm. Our results show how the proposed algorithms and methods work on real student submissions.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Searching source code fragments using incremental clustering

Abstract

Talk to us

Similar Papers

More From: Concurrency and Computation: Practice and Experience

Lead the way for us

Journal: Concurrency and Computation: Practice and Experience	Publication Date: Jun 23, 2019
Citations: 5

Similar Papers

EPlag: A two layer source code plagiarism detection system
Omer Ajmal ... M M Saad Missen
-
Omer Ajmal, et. al.Omer Ajmal ... M M Saad Missen
01 Sep 2013
01 Sep 2013

A state of art on source code plagiarism detection
Mayank Agrawal ... Dilip Kumar Sharma
-
Mayank Agrawal, et. al.Mayank Agrawal ... Dilip Kumar Sharma
01 Oct 2016
01 Oct 2016

Review of Source Code Plagiarism Detection Techniques
-
-
--
01 Jan 2021
01 Jan 2021

Issues Related to the Detection of Source Code Plagiarism in Students Assignments
...
-
, et. al. ...
01 Jan 2014
01 Jan 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Searching source code fragments using incremental clustering

Abstract

Talk to us

Similar Papers

More From: Concurrency and Computation: Practice and Experience