Scalable code clone detection and search based on adaptive prefix filtering

Manziba Akanda Nishi,Kostadin Damevski

doi:10.1016/j.jss.2017.11.039

Manziba Akanda Nishi, Kostadin Damevski

Open Access

https://doi.org/10.1016/j.jss.2017.11.039

Copy DOI

Journal: The Journal of Systems & Software	Publication Date: Nov 20, 2017
Citations: 39	License type: publisher-specific-oa

Affiliation: Virginia Commonwealth University

Abstract

Code clone detection is a well-known software engineering problem that aims to detect all the groups of code blocks or code fragments that are functionally equivalent in a code base. It has numerous and wide ranging important uses in areas such as software metrics, plagiarism detection, aspect mining, copyright infringement investigation, code compaction, virus detection, and detecting bugs. A scalable code clone detection technique, able to process large source code repositories, is crucial in the context of multi-project or Internet-scale code clone detection scenarios. In this paper, we focus on improving the scalability of code clone detection, relative to current state of the art techniques. Our adaptive prefix filtering technique improves the performance of code clone detection for many common execution parameters, when tested on common benchmarks. The experimental results exhibit improvements for commonly used similarity thresholds of between 40% and 80%, in the best case decreasing the execution time up to 11% and increasing the number of filtered candidates up to 63%.

Full Text