Abstract

In the recent era of software development, reusing software is one of the major activities that is widely used to save time. To reuse software, the copy and paste method is used and this whole process is known as code cloning. This activity leads to problems like difficulty in debugging, increase in time to debug and manage software code. In the literature, various algorithms have been developed to find out the clones but it takes too much time as well as more space to figure out the clones. Unfortunately, most of them are not scalable. This problem has been targeted upon in this paper. In the proposed framework, authors have proposed a new method of identifying clones that takes lesser time to find out clones as compared with many popular code clone detection algorithms. The proposed framework has also addressed one of the key issues in code clone detection i.e., detection of near-miss (Type-3) and semantic clones (Type-4) with significant accuracy of 95.52% and 92.80% respectively. The present study is divided into two phases, the first method converts any code into an intermediate representation form i.e., Hash-inspired abstract syntax trees. In the second phase, these abstract syntax trees are passed to a novel approach “Similarity-based self-adjusting hash inspired abstract syntax tree” algorithm that helps in knowing the similarity level of codes. The proposed method has shown a lot of improvement over the existing code clones identification methods.

Highlights

  • Authors are Computer Industry has grown significantly over the past years

  • Present software and operating systems are composed of millions of lines of code (LOC) that work to achieve a common objective with high efficiency and effectiveness

  • Rotations are performed based on values of threshold (Th)

Read more

Summary

Introduction

Authors are Computer Industry has grown significantly over the past years. High-quality software and operating systems have a major role in driving this growth. Software maintenance is highly dependent on the practices that were used to build the software One such practice those programmers use to write codes for software is code cloning. In a high code cloned system, for a certain modification to be done a programmer has to carefully perform the modifications in all the cloned sub-systems. This phenomenon is known as “bug propagation” [4]. Type 3 Clones: known as “gapped clones” are code clones that differ at the statement level. 1.3.2 Semantic Similarity Two codes are said to be similar semantically if they are similar on a functional level while completely different textually These are Type 4 clones and are the hardest to find. Most of the literature [8,9,10] studied by us focuses on Type 1 & 2 clones

Runtime Complexity
State of the Art for Type 3 Clones
State of the Art for Type 4 Clones
Latest Work on Code Clone Detection
SSA-HIAST Framework
Phase 2
Syntactic Similarity
Experimental Setup
Evaluation Criteria
Benchmarking Against the State of the Art
Conclusion and Future Scope
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call