Abstract

This article proposes the high-speed and high-accuracy code clone detection method based on the combination of tree-based and token-based methods. Existence of duplicated program codes, called code clone, is one of the main factors that reduces the quality and maintainability of software. If one code fragment contains faults (bugs) and they are copied and modified to other locations, it is necessary to correct all of them. But it is not easy to find all code clones in large and complex software. Much research efforts have been done for code clone detection. There are mainly two methods for code clone detection. One is token-based and the other is tree-based method. Token-based method is fast and requires less resources. However it cannot detect all kinds of code clones. Tree-based method can detect all kinds of code clones, but it is slow and requires much computing resources. In this paper combination of these two methods was proposed to improve the efficiency and accuracy of detecting code clones. Firstly some candidates of code clones will be extracted by token-based method that is fast and lightweight. Then selected candidates will be checked more precisely by using tree-based method that can find all kinds of code clones. The prototype system was developed. This system accepts source code and tokenizes it in the first step. Then token-based method is applied to this token sequence to find candidates of code clones. After extracting several candidates, selected source codes will be converted into abstract syntax tree (AST) for applying tree-based method. Some sample source codes were used to evaluate the proposed method. This evaluation proved the improvement of efficiency and precision of code clones detecting.

Highlights

  • This article proposes the high-speed and high-accuracy code clone detecting method

  • We propose the new method that can detect all types of code clone and run relatively fast

  • 1) Lexical analyzing source code and generate token sequence, 2) Applying token-based method to extract the candidates of code clones, 3) Generating abstract syntax trees (ASTs) of code clone candidates, 4) Comparing ASTs to fix code clones of all types

Read more

Summary

Introduction

This article proposes the high-speed and high-accuracy code clone detecting method. Code clone is a fragment of source code that is identical or similar to other portion of source code [1]. In order to detect these code clones, several methods are proposed in previous works. These methods are based on two principles; one is token-based method [5] [6] and the other is tree-based method [7] [8]. Tree-based methods can detect all types of code clone but require large computing resources (CPU time and memory). We propose the new method that can detect all types of code clone and run relatively fast. By using token-based method that runs fast, some candidates of code clones are extracted. By combining token-based method and tree-based method, our method can detect all types of code clone faster

Definitions and Types of Code Clones
Related Works of Clone Detection
Issues of Previous Works
Overall of Proposed Method
Gap of Diagonal Line
Applying Tree-Based Method
Overall of Prototype
Experimental Result
Computing Time Comparison
Quality of Proposed Method
Findings
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.