Automatic Generation of Plagiarism Detection Among Student Programs

Rachel Edita Roxas,Natasja Bautista,Nathalie Rose Lim

doi:10.1109/ithet.2006.339768

Abstract

A system for the automatic generation of plagiarism detectors that find similar programs in a set of student programs is presented. Existing plagiarism detectors are either applied to a programming language or a pre-defined set of programming languages. The general purpose one usually employs string matching to perform similarity measures that are based on plagiarism detection among documents in general, and not in programs in particular, thus, losing much of the structure and logic of programs in the process. On the other hand, plagiarism detectors for specific languages only cater to that particular set of languages. This study provides a means for the user to specify the programming language of the student programs to be analyzed. Moreover, an automatic plagiarism detector system must be immune to the transformations that students perform on copied programs. These transformations are usually dependent on several factors namely: the type of programming problems and correspondingly, the complexity of the project to be implemented by the students, and also the programming language paradigm of the programs. Thus, the similarity measures employed by the system should be determined by these factors and can be specified by the professor. He/she has the option to specify how the similarities among the student programs will be captured. The system provides an interface for the specification of the particular programming language in which the student programs are implemented, and a knowledgebase of similarity measures that the user would like to include in the analysis of the student programs. Hence, the system provides flexibility in the programming language of the student programs to be analyzed and the similarity measures that the professor wishes to employ. Initial qualitative and quantitative evaluations illustrate a flexible, convenient and cost-effective tool for building plagiarism detectors for effective detection of programs in various imperative and procedural programming languages. The approach also addresses some of the changes that students perform on copied programs which JPlag fails to handle, thus, allowing for improved accuracy in terms of the reduction of false-positives, increasing the chance of catching plagiarized programs. These changes include modification of control structures, use of temporary variables and subexpressions, in-lining and re-factoring of methods, and redundancy (variables or methods that were not used). Comprehensive tests on other programming languages under various programming language paradigms such as object-oriented, logic and functional languages, considering the different changes that the students employ to copied programs (such as the tests done in JPlag) are also recommended for empirical evaluation

Full Text