On the Effectiveness of Simhash for Detecting Near-Miss Clones in Large Scale Software Systems

Md Sharif Uddin,Kevin A Schneider,Abram Hindle,Chanchal K Roy

doi:10.1109/wcre.2011.12

Abstract

Clone detection techniques essentially cluster textually, syntactically and/or semantically similar code fragments in or across software systems. For large datasets, similarity identification is costly both in terms of time and memory, and especially so when detecting near-miss clones where lines could be modified, added and/or deleted in the copied fragments. The capability and effectiveness of a clone detection tool mostly depends on the code similarity measurement technique it uses. A variety of similarity measurement approaches have been used for clone detection, including fingerprint based approaches, which have had varying degrees of success notwithstanding some limitations. In this paper, we investigate the effectiveness of simhash, a state of the art fingerprint based data similarity measurement technique for detecting both exact and near-miss clones in large scale software systems. Our experimental data show that simhash is indeed effective in identifying various types of clones in a software system despite wide variations in experimental circumstances. The approach is also suitable as a core capability for building other tools, such as tools for: incremental clone detection, code searching, and clone management.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

On the Effectiveness of Simhash for Detecting Near-Miss Clones in Large Scale Software Systems

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

SimCad: An extensible and faster clone detection tool for large scale software systems
Md Sharif Uddin ... Chanchal K Roy
-
Md Sharif Uddin, et. al.Md Sharif Uddin ... Chanchal K Roy
01 May 2013
01 May 2013

SCCD-GAN: An Enhanced Semantic Code Clone Detection Model Using GAN
Kun Xu ... Yan Liu
-
Kun Xu, et. al.Kun Xu ... Yan Liu
17 Dec 2021
17 Dec 2021

A Metrics-Based Data Mining Approach for Software Clone Detection
Salwa K Abd-El-Hafiz
-
Salwa K Abd-El-HafizSalwa K Abd-El-Hafiz
01 Jul 2012
01 Jul 2012

Clonepedia: Summarizing Code Clones by Common Syntactic Context for Software Maintenance
Yun Lin ... Wenyun Zhao
-
Yun Lin, et. al.Yun Lin ... Wenyun Zhao
01 Sep 2014
01 Sep 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

On the Effectiveness of Simhash for Detecting Near-Miss Clones in Large Scale Software Systems

Abstract

Talk to us

Similar Papers