A systematic literature review on source code similarity measurement and clone detection: Techniques, applications, and challenges

Morteza Zakeri-Nasrabadi,Saeed Parsa,Mohammad Ramezani,Chanchal Roy,Masoud Ekhtiarzadeh

doi:10.1016/j.jss.2023.111796

Abstract

Measuring and evaluating source code similarity is a fundamental software engineering activity that embraces a broad range of applications, including but not limited to code recommendation, duplicate code, plagiarism, malware, and smell detection. This paper proposes a systematic literature review and meta-analysis on code similarity measurement and evaluation techniques to shed light on the existing approaches and their characteristics in different applications. We initially found over 10,000 articles by querying four digital libraries and ended up with 136 primary studies in the field. The studies were classified according to their methodology, programming languages, datasets, tools, and applications. A deep investigation reveals 80 software tools, working with eight different techniques on five application domains. Nearly 49% of the tools work on Java programs and 37% support C and C++, while there is no support for many programming languages. A noteworthy point was the existence of 12 datasets related to source code similarity measurement and duplicate codes, of which only eight datasets were publicly accessible. The lack of reliable datasets, empirical evaluations, hybrid methods, and focuses on multi-paradigm languages are the main challenges in the field. Emerging applications of code similarity measurement concentrate on the development phase in addition to the maintenance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A systematic literature review on source code similarity measurement and clone detection: Techniques, applications, and challenges

Abstract

Talk to us

Similar Papers

More From: Journal of Systems and Software

Lead the way for us

Journal: Journal of Systems and Software	Publication Date: Jul 5, 2023
Citations: 12

Similar Papers

A systematic literature review on the applications of recurrent neural networks in code clone research.
Fahmi H Quradaa ... Rashad S Almoqbily
PloS one | VOL. 19
Fahmi H Quradaa, et. al.Fahmi H Quradaa ... Rashad S Almoqbily
02 Feb 2024
PloS one | VOL. 19

Development nature matters: An empirical study of code clones in JavaScript applications
Wai Ting Cheung ... Sunghun Kim
Empirical Software Engineering | VOL. 21
Wai Ting Cheung, et. al.Wai Ting Cheung ... Sunghun Kim
24 Mar 2015
Empirical Software Engineering | VOL. 21

Spreadsheets are Code: An Overview of Software Engineering Approaches Applied to Spreadsheets
Felienne Hermans ... David Hoepelman
-
Felienne Hermans, et. al.Felienne Hermans ... David Hoepelman
01 Mar 2016
01 Mar 2016

Eclipse as a software development environment
...
Journal of Computing Sciences in Colleges | VOL. 24
, et. al. ...
01 Apr 2009
Journal of Computing Sciences in Colleges | VOL. 24

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A systematic literature review on source code similarity measurement and clone detection: Techniques, applications, and challenges

Abstract

Talk to us

Similar Papers

More From: Journal of Systems and Software