Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

Tree-Based Comparison for Plagiarism Detection and Automatic Marking of Programming Assignments

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Programming assignments are usually considered as a major assessment component of a programming course. As the number of students enrolling in programming courses has been always high, it becomes a difficult task to mark a large number of programming assignments effectively in a short period of time. Moreover, plagiarism on program codes has become a serious problem recently. Markers may not be able to locate similar scripts that they have marked before. This paper introduces an online assignment management system which allows programming assignments to be submitted online and marked effectively. The marking of programming assignments involves two processes: plagiarism detection among different submitted source codes and automatic marking of individual assignment which includes program testing on different test cases and checking across the model answer. In this paper, we propose the use of parse tree for checking the similarity between program codes. The method can be employed in plagiarism detection and automatic marking of programming assignments.

Similar Papers
  • Conference Article
  • Cite Count Icon 42
  • 10.1109/mipro.2016.7522248
Review of source-code plagiarism detection in academia
  • May 1, 2016
  • Matija Novak

Plagiarism is a big concern in academia and it can be a problem in every course. Plagiarism occurs when someone present others work as their own. Students plagiarize in different areas: homework assignments, essays, projects, etc. In this work focus is on programming courses and plagiarism in programming assignments. While source-code plagiarism detection, is in some way very similar to text plagiarism detection, it is very different in other ways. So, a lot of research is done focusing on source-code plagiarism. Some questions that are researched in this field are: what is considered plagiarism in programming assignments, how to perform plagiarism detection in programming assignments, how to do it automatically, what tool(s) to use, how students cheat in programming courses, how they try to obfuscate cheating, and many other questions. This work is a review of important research papers in the field of source-code plagiarism detection in academia. This paper tries to answer some of the mentioned research questions and give indication to future work.

  • Conference Article
  • Cite Count Icon 18
  • 10.1145/3021460.3021473
Unsupervised Learning Based Approach for Plagiarism Detection in Programming Assignments
  • Feb 5, 2017
  • Jitendra Yasaswi + 4 more

In this work, we propose a novel hybrid approach for automatic plagiarism detection in programming assignments. Most of the well known plagiarism detectors either employ a text-based approach or use features based on the property of the program at a syntactic level. However, both these approaches succumb to code obfuscation which is a huge obstacle for automatic software plagiarism detection. Our proposed method uses static features extracted from the intermediate representation of a program in a compiler infrastructure such as gcc. We demonstrate the use of unsupervised learning techniques on the extracted feature representations and show that our system is robust to code obfuscation. We test our method on assignments from introductory programming course. The preliminary results show that our system is better when compared to other popular tools like MOSS. For visualizing the local and global structure of the features, we obtained the low-dimensional representations of our features using a popular technique called t-SNE, a variation of Stochastic Neighbor Embedding, which can preserve neighborhood identity in low-dimensions. Based on this idea of preserving neighborhood identity, we mine interesting information such as the diversity in student solution approaches to a given problem. The presence of well defined clusters in low-dimensional visualizations demonstrate that our features are capable of capturing interesting programming patterns.

  • Research Article
  • Cite Count Icon 20
  • 10.1002/spe.839
Oto, a generic and extensible tool for marking programming assignments
  • Jul 23, 2007
  • Software: Practice and Experience
  • G Tremblay + 3 more

Marking programming assignments in programming courses involves a lot of work: each program must be tested, the source code must be read and evaluated, etc. With the large classes encountered nowadays, the feedback provided to students through marking is thus rather limited, and often late. Tools providing support for marking programming assignments do exist, ranging from support for administrative aspects through automation of program testing or support for source code evaluation based on metrics. In this paper, we introduce a tool, called Oto, that provides support for submission and marking of assignments. Oto aims at reducing the workload associated with the marking task. Oto also aims at providing timely feedback to the students, including feedback before the final submission. Furthermore, the tool has been designed to be generic and extensible, so that the marking process for a specific assignment can easily be customized and the tool can be extended with various marking components (modules) that allows it to deal with various aspects of marking (testing, style, structure, etc.) and with programs written in various programming languages. Copyright © 2007 John Wiley & Sons, Ltd.

  • Book Chapter
  • Cite Count Icon 1
  • 10.1007/978-3-642-22456-0_19
The Study of Plagiarism Detection for Program Code
  • Jan 1, 2011
  • Hao Jiang + 1 more

With the increasing popularity of programming courses, the cases of plagiarism also rises rapidly as well. Plagiarism detection methods and verifying the originality of student’s work program has become particularly important nowadays. By studying similar measurement techniques of existing code, this document focuses on the forward maximum matching algorithm proposed to improve an existing and efficient segmentation method while proposing effective marker string replacement rules in order to shorten the length of the string tag. At the same time, this paper proposes a new marker string generation method – generating tag strings in accordance with each function execution sequence, in order to eliminate redundant functions of the test results. Finally, the system would take the RKR-GST algorithm as a token string matching algorithm. The experimental tests have shown that the improvement over plagiarism detection program code has a significant effect in the long run.Keywordsplagiarism detectiongenerating tag strings by the function sequenceRKR-GST algorithmforward maximum matching algorithm

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 43
  • 10.3390/app10134566
Automated Assessment and Microlearning Units as Predictors of At-Risk Students and Students’ Outcomes in the Introductory Programming Courses
  • Jun 30, 2020
  • Applied Sciences
  • Jan Skalka + 1 more

The number of students who decided to study information technology related study programs is continually increasing. Introductory programming courses represent the most crucial milestone in information technology education and often reflect students’ ability to think abstractly and systematically, solve problems, and design their solutions. Even though many students who attend universities have already completed some introductory courses of programming, there is still a large group of students with limited programming skills. This drawback often increases during the first term, and it is often the main reason why students leave study too early. There is a myriad of technologies and tools which can be involved in the programming course to increase students’ chances of mastering programming. The introductory programming courses used in this study has been gradually extended over the four academic years with the automated source code assessment of students’ programming assignments followed by the implementation of a set of suitably designed microlearning units. The final four datasets were analysed to confirm the suitability of automated assessment and microlearning units as predictors of at-risk students and students’ outcomes in the introductory programming courses. The research results proved the significant contribution of automated code assessment in students’ learning outcomes in the elementary topics of learning programming. Simultaneously, it proved a moderate to strong dependence between the students’ activity and achievement in the activities and final students’ outcomes.

  • Research Article
  • Cite Count Icon 20
  • 10.1145/1869746.1869766
PlagDetect
  • Dec 1, 2010
  • ACM Inroads
  • Z A Al-Khanjari + 3 more

Practical computing courses that involve significant amount of programming assessment tasks suffer from e-Plagiarism. A pragmatic solution for this problem could be by discouraging plagiarism particularly among the beginners in programming. One way to address this is to automate the detection of plagiarized work during the marking phase. Our research in this context involves at first examining various metrics used in plagiarism detection in program codes and secondly selecting an appropriate statistical measure using attribute counting metrics (ATMs) for detecting plagiarism in Java programming assignments. The goal of this investigation is to study the effectiveness of ATMs for detecting plagiarism among assignment submissions of introductory programming courses.

  • Conference Article
  • Cite Count Icon 9
  • 10.1109/fie.2018.8658531
Token-based Approach for Real-time Plagiarism Detection in Digital Designs
  • Oct 1, 2018
  • Han Wan + 2 more

This Research to Practice Work in Progress Paper presents a token-based approach to detecting plagiarism in university courses with hardware programming assignments. Detecting plagiarism manually is a difficult and time-consuming work. In the last two decades, various of plagiarism detection tools have been developed. These techniques could be mainly divided into the following categories: Textual Match, Program Dependence Graph Comparison, Abstract Syntax Tree Analysis and Low-Level Form Code Comparison. Although there had been a lot of researches on detecting code clones in software programming languages (e.g. Basic, C/C++, Java, Python, etc.), research that focused on hardware description languages is still lacking. Based on the effective of the locality sensitive hash function (simhash), which was usually used in detecting near duplicates for web crawling, we proposed an improved real-time plagiarism detection approach for Verilog HDL (hardware description language) programming assignments. The core detecting steps are extracting weighted tokens from source code as high-dimensional feature, and mapping it to a f-bit fingerprints with simhash technique. On account of the syntax characteristics of Verilog HDL, a token extraction strategy was designed to maximize the valid information that a fixed length hash value could represent. Experiments over real course data sets were conducted to evaluate the performance of token-based approach comparing with an existing plagiarism detection tool (Moss). The result shows that our token-based approach does qualify the plagiarism detecting job for both online-query and batch-query in digital designs. Furthermore, token-based plagiarism detection approach could enable conduct incremental plagiarism detection for a single submission without excessive overhead. Finally, we also give a discussion of current way limitations and future research directions.

  • Research Article
  • Cite Count Icon 1
  • 10.1142/s0218126623502869
Applying Coding Behavior Features to Student Plagiarism Detection on Programming Assignments
  • May 26, 2023
  • Journal of Circuits, Systems and Computers
  • Zheng Li + 4 more

In programming education, the result of plagiarism detection is a crucial criterion for assessing whether or not students can pass course exams. Recently, the prevalent methods for detecting student plagiarism have been proposed by analyzing source code. These methods extract features (such as token, abstract syntax tree and control flow graph) from the source code, examine the similarity of codes using various similarity detection methods, and then perform plagiarism detection based on a predefined plagiarism threshold. However, these previous methods for plagiarism detection have some problems. First, they are less effective in detecting code modification related to structure. Second, they require a considerable number of training data, which demand high computing time and space. Third, they cannot determine whether students plagiarize in time. We propose a novel plagiarism detection method by analyzing the behavioral features of students during the coding process. Specifically, we extract five behavioral features based on students’ programming habits. Then, we use a feature ranking-based suspiciousness algorithm to obtain the possibility of student plagiarism. Based on our proposed method, we develop the Online Integrated Programming Platform. To evaluate the accuracy of our method, we conduct a series of experiments. Final experimental results indicate that our method achieves promising results with Accuracy, Precision, Recall and [Formula: see text] values of 0.95, 0.90, 0.95 and 0.92, respectively. Finally, we also analyze the correlation between whether students plagiarized and their regular and final grades, which can further verify the effectiveness of our proposed method.

  • Research Article
  • Cite Count Icon 148
  • 10.1080/08993408.2012.655091
My program is ok – am I? Computing freshmen's experiences of doing programming assignments
  • Mar 1, 2012
  • Computer Science Education
  • Päivi Kinnunen + 1 more

This article provides insight into how computing majors experience the process of doing programming assignments in their first programming course. This grounded theory study sheds light on the various processes and contexts through which students constantly assess their self-efficacy as a programmer. The data consists of a series of four interviews conducted with a purposeful sample of nine computer science majors in a research intensive state university in the United States. Use of the constant comparative method elicited two forms of results. First, we identified six stages of doing a programming assignment. Analysis captures the dimensional variation in students' experiences with programming assignments on a detailed level. We identified a core category resulting from students' reflected emotions in conjunction with self-efficacy assessment. We provide a descriptive model of how computer science majors build their self-efficacy perceptions, reported via four narratives. Our key findings are that some students reflect negative views of their efficacy, even after having a positive programming experience and that in other situations, students having negative programming experiences still have a positive outlook on their efficacy. We consider these findings in light of possible languages and support structures for introductory programming courses.

  • Conference Article
  • Cite Count Icon 24
  • 10.1109/educon.2018.8363346
Benefits and drawbacks of source code plagiarism detection in engineering education
  • Apr 1, 2018
  • Dieter Pawelczak

Source code plagiarism is wide spread in beginners' programming courses. Especially, if programming is a minor subject, as for instance in engineering degrees. It is very tempting for students during a programming assignment to use a working copy of a fellow student rather than struggling with the time-consuming coding by themselves. But as learning programming requires a significant personal commitment, we confirm the results of other studies, that cheating leads to higher failure rates and lower scores in the examination. Automatic plagiarism detection systems are therefore measures against cheating. We analyzed the students' achievements and opinions during the last 5 years of operating an automated assessment system with plagiarism detection. The paper discusses in detail the benefits of such a system, e.g. the equal treatment of all students compared to manual plagiarism checks, and shows also the disadvantages, e.g. code obfuscation, that students perform in order to circumvent the system.

  • Conference Article
  • Cite Count Icon 1
  • 10.1109/csse.2008.620
Disputed Authorship in C Program Code after Detection of Plagiarism
  • Jan 1, 2008
  • Jie Zhao + 2 more

In recent years disputed authorship in source code has been congruently neglected after detection of plagiarism has been heatedly debated for a long period, theoretical and feasible measures have been taken to weight its property. In this paper, according to existence of individual but distinct programming style, the authors, with aid of SVM (support vector machine), one tool of neural networks and a series of algorithm, examine a brand-new application to puzzle out who is the real writer in C programming surrounded by crowds of reputed ones . The whole algorithm in SVM is a process to carry out: first, refined the frequencies as training sets through math formulas such like Gauss axiom, Lagrange formula are classified into value 1 or -1; then travel neural networks to get result sequence .it is a feasible approach to detect the authorship in C programming after detection of plagiarism.

  • Research Article
  • Cite Count Icon 24
  • 10.15388/infedu.2016.06
Process Model Improvement for Source Code Plagiarism Detection in Student Programming Assignments
  • Apr 13, 2016
  • Informatics in Education
  • Dragutin Kermek + 1 more

In programming courses there are various ways in which students attempt to cheat. The most commonly used method is copying source code from other students and making minimal changes in it, like renaming variable names. Several tools like Sherlock, JPlag and Moss have been devised to detect source code plagiarism. However, for larger student assignments and projects that involve a lot of source code files these tools are not so effective. Also, issues may occur when source code is given to students in class so they can copy it. In such cases these tools do not provide satisfying results and reports. In this study, we present an improved process model for plagiarism detection when multiple student files exist and allowed source code is present. In the research in this paper we use the Sherlock detection tool, although the presented process model can be combined with any plagiarism detection engine. The proposed model is tested on assignments in three courses in two subsequent academic years.

  • Research Article
  • Cite Count Icon 1
  • 10.47164/ijngc.v13i5.964
PLAGIARISM DETECTION IN PROGRAMMING USING PERFORMANCE ANALYZING FEATURES
  • Nov 26, 2022
  • International Journal of Next-Generation Computing
  • D.S Adane + 4 more

In recent years, plagiarism that uses the code snippets or program of others without permission has become a social problem. It is widely spread from very familiar student reports to worldwide academic papers. In this paper, we deal with plagiarism in programming assignments, and explain the plagiarism patterns often found in text. Existing plagiarism detection tools utilize string matching algorithms to calculate the plagiarism. We have brought to light the problems associated with existing tools and propose a method to rectify them efficiently with the help of algorithms proposed in the paper. To the existing detection method, we combine some heuristics which are estimation of time complexity and loop detection, to improve the accuracy of the plagiarized sections and propose it as a plagiarism detection method.

  • Research Article
  • Cite Count Icon 91
  • 10.1109/te.2007.906778
Detection of Plagiarism in Programming Assignments
  • May 1, 2008
  • IEEE Transactions on Education
  • Francisco Rosales + 5 more

Laboratory work assignments are very important for computer science learning. Over the last 12 years many students have been involved in solving such assignments in the authors' department, having reached a figure of more than 400 students doing the same assignment in the same year. This number of students has required teachers to pay special attention to conceivable plagiarism cases. A plagiarism detection tool has been developed as part of a full toolset for helping in the management of the laboratory work assignments. This tool defines and uses four similarity criteria to measure how similar two assignment implementations are. The paper describes the plagiarism detection tool and the experience of using it over the last 12 years in four different programming assignments, from microprogramming a CPU to system programming in C.

  • Research Article
  • Cite Count Icon 51
  • 10.1007/s11042-018-5827-6
Plagiarism detection in students’ programming assignments based on semantics: multimedia e-learning based smart assessment methodology
  • Mar 12, 2018
  • Multimedia Tools and Applications
  • Farhan Ullah + 5 more

The multimedia-based e-Learning methodology provides virtual classrooms to students. The teacher uploads learning materials, programming assignments and quizzes on university’ Learning Management System (LMS). The students learn lessons from uploaded videos and then solve the given programming tasks and quizzes. The source code plagiarism is a serious threat to academia. However, identifying similar source code fragments between different programming languages is a challenging task. To solve the problem, this paper proposed a new plagiarism detection technique between C++ and Java source codes based on semantics in multimedia-based e-Learning and smart assessment methodology. First, it transforms source codes into tokens to calculate semantic similarity in token by token comparison. After that, it finds semantic similarity in scalar value for the complete source codes written in C++ and Java. To analyse the experiment, we have taken the dataset consists of four (4) case studies of Factorial, Bubble Sort, Binary Search and Stack data structure in both C++ and Java. The entire experiment is done in R Studio with R version 3.4.2. The experimental results show better semantic similarity results for plagiarism detection based on comparison.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant