Problematic Code Clones Identification Using Multiple Detection Results

Yoshiki Higo,Ken-Ichi Sawa,Shinji Kusumoto

doi:10.1109/apsec.2009.30

Abstract

Most code clones are generated by copy-and paste programming. Copy-and-paste programming shortens a time required for implementation because pasted code is a template of the required functionality. However, it sometimes brings on new bugs to the source code. After copy-and-paste, pasted code is somewhat changed fitting for the context of the region surrounding the pasted code. For example, some identifiers are replaced with other identifiers or a few statements are inserted, deleted, or changed. If such modifications are incorrectly performed, bugs occur in code clones. However, not all code clones are problematic, many code clones have decent reasons for their existence. Consequently, simple code clone detection is inefficient for identifying problematic code clones. Firstly, this paper proposes a classification scheme for dividing problematic code clones from non problematic ones. Secondly, it proposes a method for extracting specific code clones classified as problematic ones. Thirdly, it presents results of case studies conducted for evaluating the proposed method. The proposed method uses multiple code clone detection tools, and it doesn't directly analyze program source code. After multiple detections, simple operations are performed to extract code clones that are likely to be problematic. In the case studies conducted on an open source software system, the proposed method could actually identify 22 problematic code clones.

Full Text