Plagiarism detection in students’ programming assignments based on semantics: multimedia e-learning based smart assessment methodology
The multimedia-based e-Learning methodology provides virtual classrooms to students. The teacher uploads learning materials, programming assignments and quizzes on university’ Learning Management System (LMS). The students learn lessons from uploaded videos and then solve the given programming tasks and quizzes. The source code plagiarism is a serious threat to academia. However, identifying similar source code fragments between different programming languages is a challenging task. To solve the problem, this paper proposed a new plagiarism detection technique between C++ and Java source codes based on semantics in multimedia-based e-Learning and smart assessment methodology. First, it transforms source codes into tokens to calculate semantic similarity in token by token comparison. After that, it finds semantic similarity in scalar value for the complete source codes written in C++ and Java. To analyse the experiment, we have taken the dataset consists of four (4) case studies of Factorial, Bubble Sort, Binary Search and Stack data structure in both C++ and Java. The entire experiment is done in R Studio with R version 3.4.2. The experimental results show better semantic similarity results for plagiarism detection based on comparison.
- Conference Article
34
- 10.1109/ngct.2016.7877421
- Oct 1, 2016
Plagiarism is becoming a serious problem for intellectual community. The detection of plagiarism at various levels is a major issue. The complexity of the problem increases when we are finding the plagiarism in the source codes that may be in the same language or they have been transformed into other languages. This type of plagiarism is found not only in the academic works but also in the industries dealing with software designing. The major issue with the source code plagiarism is that different programming languages may have different syntax. In this paper the authors will explain various techniques and algorithms to discover the plagiarism in source code. So organization or academic institution can simply discover plagiarism in source code using these techniques. The authors will differentiate among these given techniques of plagiarism to discover how one technique is conflicting with the other.
- Research Article
- 10.4028/www.scientific.net/amm.373-375.1172
- Aug 1, 2013
- Applied Mechanics and Materials
Because of the complexity of the software development, some software developers may plagiarize source code that comes from other projects or open source software in order to shorten development cycle. Usually the copyist would modify and disguise the source code copied to escape plagiarism detection. So far, most algorithms cant completely detect the source disguised by the copyist, especially cant exactly distinguish between the source code and the plagiaristic code. In this paper, we summarize and analyze the effect of disguised source to the detection process, design the strategy to remove the effect of disguised source, and propose a PDG-based software source code plagiarism detection algorithm. The algorithm can detect the existence of disguised source, so as to find out source code plagiarism. And we propose a heuristic rule to make the detection algorithm have the ability to give the plagiarism direction. Any existing algorithm does not have this function. We prove the availability of the algorithm by experiment.
- Research Article
24
- 10.15388/infedu.2016.06
- Apr 13, 2016
- Informatics in Education
In programming courses there are various ways in which students attempt to cheat. The most commonly used method is copying source code from other students and making minimal changes in it, like renaming variable names. Several tools like Sherlock, JPlag and Moss have been devised to detect source code plagiarism. However, for larger student assignments and projects that involve a lot of source code files these tools are not so effective. Also, issues may occur when source code is given to students in class so they can copy it. In such cases these tools do not provide satisfying results and reports. In this study, we present an improved process model for plagiarism detection when multiple student files exist and allowed source code is present. In the research in this paper we use the Sherlock detection tool, although the presented process model can be combined with any plagiarism detection engine. The proposed model is tested on assignments in three courses in two subsequent academic years.
- Conference Article
10
- 10.1109/iccict.2015.7045739
- Jan 1, 2015
Source code plagiarism has been a concern for many teachers in computer science field, given to the ease of availability of content in this era of internet. We developed a tool for detecting plagiarism in source codes of students learning programming languages, to cater to the needs of teachers and help them monitor students source codes. Currently our tool supports six programming languages namely, C, C++, Java, Perl, Python and Php. The tool works in three steps. Tokenization followed by N-Gram representation of source codes and then comparison using Greedy String Tiling algorithm. Response time of our tool is one minute for 50 source code files of length 75 lines of code (LOC). The feedback given by the teachers after using our tool, in one of our post graduate courses for advance computing, is over whelming. As per them results given by the tool are ninety-nine percent correct. So we strongly believe that this tool can help to analyse student's true capabilities and help the teachers tremendously in plagiarism detection.
- Research Article
54
- 10.1109/access.2021.3069367
- Jan 1, 2021
- IEEE Access
Source code plagiarism is a long-standing issue in tertiary computer science education. Many source code plagiarism detection tools have been proposed to aid in the detection of source code plagiarism. However, existing detection tools are not robust to pervasive plagiarism-hiding transformations, and as a result can be inaccurate in the detection of plagiarised source code. This article presents BPlag, a behavioural approach to source code plagiarism detection. BPlag is designed to be both robust to pervasive plagiarism-hiding transformations, and accurate in the detection of plagiarised source code. Greater robustness and accuracy is afforded by analysing the behaviour of a program, as behaviour is perceived to be the least susceptible aspect of a program impacted upon by plagiarism-hiding transformations. BPlag applies symbolic execution to analyse execution behaviour and represent a program in a novel graph-based format. Plagiarism is then detected by comparing these graphs and evaluating similarity scores. BPlag is evaluated for robustness, accuracy and efficiency against 5 commonly used source code plagiarism detection tools. It is then shown that BPlag is more robust to plagiarism-hiding transformations and more accurate in the detection of plagiarised source code, but is less efficient than compared tools.
- Conference Article
12
- 10.1109/icdim.2013.6693984
- Sep 1, 2013
In academic environments where students are partly evaluated on the assignments, it is necessary to discourage the practice of copying assignments of other students. The detection of plagiarism in code from large source code repositories, manual detection is fairly complex, if not impossible. Therefore, for fair evaluation there must be a fast, efficient and automatedlsemi-automated way to detect the assignments copied. Source Code metrics can be used to detect the source code plagiarism in programming assignments submitted by university students. In this paper we have developed a source code plagiarism detection system and tried to improve the existing techniques by separating the suspected files and the non-plagiarized files, thus reducing the dataset for further comparison. A number of source code metrics have been calculated, combined using similarity detection formula to give an aggregate view of the source code metrics. After that the suspected files are separated and then performed string-matching to detect the level of similarity.
- Conference Article
8
- 10.1109/icaecc54045.2022.9716671
- Jan 10, 2022
Taking someone else’s work and claiming it as your own is termed as plagiarism. Plagiarism is a concerning issue in every field of education. There are various tools to detect plagiarism and help maintain the necessary integrity. This paper deals with plagiarism in the specific category of C programming assignments. Various machine learning and deep learning methods are investigated in detail along with the pros and cons. Concepts such as KNN, SVM, D-Trees, RNNs, and attention based transformer networks are tested for their effectiveness in detecting plagiarism in source code. A comprehensive dataset consisting of code pairs was prepared during the course of this research. Results obtained show that Machine Learning and Deep Learning methods provide better accuracy at detecting plagiarism than the current state of the art plagiarism detectors that use a text based approach. A tool is also presented to utilize the built software to detect plagiarism in source code.
- Conference Article
17
- 10.1109/telfor.2017.8249481
- Nov 1, 2017
Computing education involves practical training through programming assignments which are frequent targets for plagiarism. For that reason, software systems for source code similarity detection are used to prevent this inappropriate behavior. In this paper, different aspects of source code plagiarism in academic environment are discussed and several improvements of plagiarism detection systems were proposed. Three main improvements of such systems were proposed: parallelization of similarity detection algorithms, similarity network visualization, and analysis of results using social network analysis.
- Research Article
6
- 10.1016/j.softx.2024.101755
- May 1, 2024
- SoftwareX
Source code plagiarism is a significant issue in educational practice, and educators need user-friendly tools to cope with such academic dishonesty. This article introduces the latest version of Dolos, a state-of-the-art ecosystem of tools for detecting and preventing plagiarism in educational source code. In this new version, the primary focus has been on enhancing the user experience. Educators can now run the entire plagiarism detection pipeline from a new web app in their browser, eliminating the need for any installation or configuration. Completely redesigned analytics dashboards provide an instant assessment of whether a collection of source files contains suspected cases of plagiarism and how widespread plagiarism is within the collection. The dashboards support hierarchically structured navigation to facilitate zooming in and out of suspect cases. Clusters are an essential new component of the dashboard design, reflecting the observation that plagiarism can occur among larger groups of students. To meet various user needs, the Dolos software stack for source code plagiarism detection now includes a self-hostable web app, a JSON application programming interface (API), a command line interface (CLI), a JavaScript library and a preconfigured Docker container. Clear documentation and a free-to-use instance of the web app can be found at https://dolos.ugent.be. The source code is also available on GitHub.
- Conference Article
12
- 10.1109/icsess47205.2019.9040853
- Oct 1, 2019
Source code plagiarism is a common occurrence in undergraduate computer science education. Studies have indicated at least 50% of students plagiarize during their undergraduate career. To identity cases of source code plagiarism, many source code plagiarism detection tools have been proposed. However, conclusively determining the effectiveness these tools at identifying cases of source code plagiarism is difficult. Evaluations are typically performed using unreleased data sets. Without a comprehensive publicly available data set for source code plagiarism detection evaluation, it is difficult to perform an unbiased and reproducible evaluations of tools. To address this problem, this paper presents a tool, SPPlagiarise, which is designed to produce simulated source code plagiarism of Java source code. SPPlagiarise applies a random number of semantics-preserving source code obfuscations at random locations to a Java code base to simulate source code plagiarism. In this paper the design of the tool and an evaluation of a generated plagiarism data set is presented.
- Conference Article
6
- 10.1109/imitec52926.2021.9714688
- Nov 23, 2021
Plagiarism during programming assignments is a problem in academia. It hinders the ability of academic instructors to truly judge students’ performance and thus, prevents students from receiving adequate help from their instructors. In cases where the number of code submissions for a particular assignment is relatively small, the instructor can inspect each code submission to determine whether they are similar. But as the number of code submissions grows, it becomes difficult to detect similarities between them. Therefore, this induces the need for an automatic source code plagiarism detector. Previous studies showed that we could use the abstract syntax tree (AST) of a source code to get an accurate representation of the source code for neural network computations. Although a study even presented a recursive artificial neural network named Abstract Syntax Tree-based Neural Network (ASTNN) that could represent source codes into vector embeddings using their ASTs, it does not use contrastive learning paradigms, shown to increase the performance of Siamese networks in similarity detection tasks. Therefore, this paper presents an improved version of the ASTNN for code clone detection, where we modify the original model for contrastive learning. Experiments demonstrated that we outperform the original ASTNN model in code clone detection tasks, with a+5% improvement in the F1-score of our model. This study aims at improving the way we perform similarity detection tasks involving programming languages.
- Conference Article
14
- 10.5555/2662708.2662718
- May 19, 2013
The advent of internet and growth of open source software repositories has made source code readily accessible to software developers. Although, reusing of source code has its own advantages, care must be taken to ensure that proprietary software does not infringe any licenses. In this context, plagiarism detection plays an important role. In this paper, we propose a robust technique to detect plagiarism in source code. Our approach uses a language aware token representation, which is resilient to code transformations and an improved querying and matching technique to detect plagiarism in software code. We evaluated our approach by comparing it with other plagiarism detection tools - Copy Paste Detector (CPD), Sherlock, CCFinder and Plaggie.
- Conference Article
15
- 10.1109/iwsc.2013.6613041
- May 1, 2013
The advent of internet and growth of open source software repositories has made source code readily accessible to software developers. Although, reusing of source code has its own advantages, care must be taken to ensure that proprietary software does not infringe any licenses. In this context, plagiarism detection plays an important role. In this paper, we propose a robust technique to detect plagiarism in source code. Our approach uses a language aware token representation, which is resilient to code transformations and an improved querying and matching technique to detect plagiarism in software code. We evaluated our approach by comparing it with other plagiarism detection tools - Copy Paste Detector (CPD), Sherlock, CCFinder and Plaggie.
- Research Article
1
- 10.31039/ljss.2023.6.104
- Sep 17, 2023
- London Journal of Social Sciences
In an era marked by the increasing digitization of society, the issue of source code plagiarism has emerged as a persistent concern. This research paper delves into the problem of source code plagiarism within educational settings, exploring its implications, potential remedies, and the associated hurdles in implementing these solutions. Source code plagiarism involves the unauthorized copying of code without proper attribution, and it has been on the rise in educational institutions due to various contributing factors. This paper sheds light on the educational system's pressures, time constraints, lofty expectations, and the allure of quick completion that make source code plagiarism appealing to students. Furthermore, it highlights the lack of understanding among students regarding academic integrity and citation methods, exacerbating the problem. Source code plagiarism not only hampers students' intellectual development and problem-solving skills but also undermines the fairness of assessments, posing grading challenges for educators. Nevertheless, there are several potential solutions. While proactive methods focus on prevention through education and policy, reactive methods employ AI-driven plagiarism detectors for detection. However, these solutions are not without their challenges, such as the issue of false positives in plagiarism detection and the potential adversarial response from students. In conclusion, source code plagiarism is a growing problem in modern society that can not be avoided any longer. Potential solutions to source code plagiarism should be taken into account while considering their withdrawals. Computer science and programming courses should foster a sense of integrity to avoid source code plagiarism and develop new generations of coders for the future.
- Conference Article
42
- 10.1109/mipro.2016.7522248
- May 1, 2016
Plagiarism is a big concern in academia and it can be a problem in every course. Plagiarism occurs when someone present others work as their own. Students plagiarize in different areas: homework assignments, essays, projects, etc. In this work focus is on programming courses and plagiarism in programming assignments. While source-code plagiarism detection, is in some way very similar to text plagiarism detection, it is very different in other ways. So, a lot of research is done focusing on source-code plagiarism. Some questions that are researched in this field are: what is considered plagiarism in programming assignments, how to perform plagiarism detection in programming assignments, how to do it automatically, what tool(s) to use, how students cheat in programming courses, how they try to obfuscate cheating, and many other questions. This work is a review of important research papers in the field of source-code plagiarism detection in academia. This paper tries to answer some of the mentioned research questions and give indication to future work.