Code Clone Detection Method Based on the Combination of Tree-Based and Token-Based Methods

Ryota Ami,Hirohide Haga

doi:10.4236/jsea.2017.1013051

Abstract

This article proposes the high-speed and high-accuracy code clone detection method based on the combination of tree-based and token-based methods. Existence of duplicated program codes, called code clone, is one of the main factors that reduces the quality and maintainability of software. If one code fragment contains faults (bugs) and they are copied and modified to other locations, it is necessary to correct all of them. But it is not easy to find all code clones in large and complex software. Much research efforts have been done for code clone detection. There are mainly two methods for code clone detection. One is token-based and the other is tree-based method. Token-based method is fast and requires less resources. However it cannot detect all kinds of code clones. Tree-based method can detect all kinds of code clones, but it is slow and requires much computing resources. In this paper combination of these two methods was proposed to improve the efficiency and accuracy of detecting code clones. Firstly some candidates of code clones will be extracted by token-based method that is fast and lightweight. Then selected candidates will be checked more precisely by using tree-based method that can find all kinds of code clones. The prototype system was developed. This system accepts source code and tokenizes it in the first step. Then token-based method is applied to this token sequence to find candidates of code clones. After extracting several candidates, selected source codes will be converted into abstract syntax tree (AST) for applying tree-based method. Some sample source codes were used to evaluate the proposed method. This evaluation proved the improvement of efficiency and precision of code clones detecting.

Highlights

This article proposes the high-speed and high-accuracy code clone detecting method
We propose the new method that can detect all types of code clone and run relatively fast
1) Lexical analyzing source code and generate token sequence, 2) Applying token-based method to extract the candidates of code clones, 3) Generating abstract syntax trees (ASTs) of code clone candidates, 4) Comparing ASTs to fix code clones of all types

Summary

Introduction

This article proposes the high-speed and high-accuracy code clone detecting method. Code clone is a fragment of source code that is identical or similar to other portion of source code [1]. In order to detect these code clones, several methods are proposed in previous works. These methods are based on two principles; one is token-based method [5] [6] and the other is tree-based method [7] [8]. Tree-based methods can detect all types of code clone but require large computing resources (CPU time and memory). We propose the new method that can detect all types of code clone and run relatively fast. By using token-based method that runs fast, some candidates of code clones are extracted. By combining token-based method and tree-based method, our method can detect all types of code clone faster

Definitions and Types of Code Clones

Related Works of Clone Detection

Issues of Previous Works

Overall of Proposed Method

Gap of Diagonal Line

Applying Tree-Based Method

Overall of Prototype

Experimental Result

Computing Time Comparison

Quality of Proposed Method

Findings

Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Software Engineering and Applications	Publication Date: Jan 1, 2017
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Code Clone Detection Method Based on the Combination of Tree-Based and Token-Based Methods

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Software Engineering and Applications

Lead the way for us

Similar Papers

Java Code Clone Detection by Exploiting Semantic and Syntax Information From Intermediate Code-Based Graph
Dawei Yuan ... Tao Zhang
IEEE Transactions on Reliability | VOL. 72
Dawei Yuan, et. al.Dawei Yuan ... Tao Zhang
01 Jun 2023
IEEE Transactions on Reliability | VOL. 72

Neural Detection of Semantic Code Clones Via Tree-Based Convolution
Hao Yu ... Long Chen
-
Hao Yu, et. al.Hao Yu ... Long Chen
01 May 2019
01 May 2019

Semantic Code Clone Detection Based on Community Detection
Zexuan Wan ... Chunli Xie
International Journal of Software Engineering and Knowledge Engineering | VOL. 34
Zexuan Wan, et. al.Zexuan Wan ... Chunli Xie
26 Jul 2024
International Journal of Software Engineering and Knowledge Engineering | VOL. 34

SCDetector
Yueming Wu ... Hai Jin
-
Yueming Wu, et. al.Yueming Wu ... Hai Jin
21 Dec 2020
21 Dec 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Code Clone Detection Method Based on the Combination of Tree-Based and Token-Based Methods

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Software Engineering and Applications