Detecting Copy Directions among Programs Using Extreme Learning Machines

Bin Wang,Xiaochun Yang,Guoren Wang

doi:10.1155/2015/793697

Bin Wang, Xiaochun Yang + Show 1 more

Open Access

https://doi.org/10.1155/2015/793697

Copy DOI

Journal: Mathematical Problems in Engineering	Publication Date: Jan 1, 2015
Citations: 29	License type: CC BY 3.0

Affiliation: Northeastern University

Abstract

Because of the complexity of software development, some software developers may plagiarize source code from other projects or open source software in order to shorten development cycle. Many methods have been proposed to detect plagiarism among programs based on the program dependence graph, a graph representation of a program. However, to our best knowledge, existing works only detect similarity between programs without detecting copy direction among them. By employing extreme learning machine (ELM), we construct feature space for describing features of every two programs with possible plagiarism relationship. Such feature space could be large and time consuming, so we propose approaches to construct a small feature space by pruning isolated control statements and removable statements from each program to accelerate both training and classification time. We also analyze the features of data dependencies between any original program and its copy program, and based on it we propose a feedback framework to find a good feature space that can achieve both accuracy and efficiency. We conducted a thorough experimental study of this technique on real C programs collected from the Internet. The experimental results show the high accuracy and efficiency of our ELM-based approaches.

Full Text