Improving Maintenance-Consistency Prediction During Code Clone Creation

Fanlong Zhang,Xiaohong Su,Siau-Cheng Khoo

doi:10.1109/access.2020.2990645

Fanlong Zhang, Xiaohong Su + Show 1 more

Open Access

https://doi.org/10.1109/access.2020.2990645

Copy DOI

Abstract

Developers frequently introduce code clones into software through the copy-and-paste operations during the software development phase in order to shorten development time. Not all such clone creations are beneficial to software maintenance, as they may introduce extra effort at the software maintenance phase, where additional care is needed to ensure consistent change among these clones; i.e., changes made to a piece of code may need to be propagated to other clones. Failure in doing so may risk introducing bugs into software, which are usually called consistent defect. In response to the rampant maintenance cost caused by the introduction of new clones, some researchers have advocated the use of machine-learning approach to predict the likelihood of consistent change requirement when clones are freshly introduced. Leading in this approach is the work by Wang et al., which uses Bayesian Network to model maintenance-consistency of newly introduced clones. In this work, we leverage the success of the above-mentioned work by providing a revised set of attributes that has been shown to strengthen the predictive power of the Bayesian network model, as determined more quantitatively by the precision and recall levels. We firstly provide the definition of clone consistency-maintenance requirement, which can help transfer this problem to a classification problem. Then, based on collecting all clone creation operations through traversing clone genealogies, we redesign the attribute sets for representing clone creation with more information in code and context perspective. We evaluate the effectiveness of our approach on four open source projects with more quantitative analysis, and the experimental results show that our approach possesses a powerful ability in predicting clone consistency. To transfer this work into practice, we develop an Eclipse plug-in tool of this prediction to aid developers in software development and maintenance.

Highlights

The ever-increasing demand for shortening the development time to reach the marketplace has put immense pressure on software developers to find means to develop their codes quickly
In this work, we present a novel definition of consistent change to clone group that only requires at least two of code clones having consistent change, as follows, Definition 1 (Consistent Change): A clone group CG in software version j + 1 possesses consistent change if there exists a pair of clones CF1, and CF2 in CG which are mappable to a pair of clones CF1 and CF2 in a clone group CG in version j such that modification of code pairs from (CF1, CF2) to (CF1, CF2) satisfies the following, Simtext(CFi, CFi ) < 1 ∀i ∈ {1, 2}
To help the developer maintain such consistency, we suggest to perform the clone consistency prediction when changing the code clones in a group(those interested readers can refer to [8] for discussion), and to perform consistent change automatically with the tool named JSync developed by Nguyen et al [30]

Summary

Introduction

The ever-increasing demand for shortening the development time to reach the marketplace has put immense pressure on software developers to find means to develop their codes quickly. One technique that is used prevalently, and even encouraged in school, is to reuse existing code fragments. Among these reuses, copy-and-paste operation is by far the simplest and widely used technique. Code has been copied and pasted, the developer may decide to modify it slightly – or not at all, creating many similar code fragments within a piece of software. Two pieces of similar codes are known as clones to each other [1], and operations such as ‘‘copy-and-paste’’ are called clone creating operation. Depending on the ‘‘similarity’’ criteria, two code clones may be identical, syntactically or semantically similar

Methods

Results

Conclusion