Abstract

Finding the same or similar code snippets in the source code for a query code snippet is one of the fundamental activities in software maintenance. Code clone detectors detect the same or similar code snippets, but they report all of the code clone pairs in the target, which are generally excessive to the users. In this paper, we propose ccgrep, a token-based pattern matching tool with the notion of code clone pairs. The user simply inputs a code snippet as a query and specifies the target source code, and gets the matched code snippets as the result. The query and the result snippets form clone pairs. The use of special tokens (named meta-tokens) in the query allows the user to have precise control over the matching. It works for the source code in C, C++, Java, and Python on Windows or Unix with practical scalability and performance. The evaluation results show that ccgrep is effective in finding intended code snippets in large Open Source Software.

Highlights

  • Finding and locating the same or similar code snippets in source code files is a fundamental activity in software development and maintenance, and various kinds of software engineering tools or IDEs have been proposed and implemented[19]

  • A large body of scientific literature on clone detection has been published and various kinds of code clone detection tools have been developed[18, 20]. These code clone detectors are candidates for finding similar code snippets, but most of those are designed to detect all of the code clone pairs in the target, which are generally excessive to the user who wants to search for a specific query snippet

  • It has been reported that grep[8], a character-based pattern matching tool, is widely used in the software engineering practice to find lines with a specific keyword[14, 21], making a query for a code snippet that spans multiple lines needs some skill and effort

Read more

Summary

Introduction

Finding and locating the same or similar code snippets in source code files is a fundamental activity in software development and maintenance, and various kinds of software engineering tools or IDEs have been proposed and implemented[19]. A large body of scientific literature on clone detection has been published and various kinds of code clone detection tools (detectors) have been developed[18, 20]. These code clone detectors are candidates for finding similar code snippets, but most of those are designed to detect all of the code clone pairs in the target, which are generally excessive to the user who wants to search for a specific query snippet. Ccgrep works on Windows or Unix as a simple but reliable clone detector and pattern matching tool for C, C++, Java, and Python. ccgrep has been applied to various applications, and it showed high scalability and performance for large source-code collection. ccgrep is an Open Source Software system and can be obtained from GitHub

Motivating Example
Basic Features
Query for Type 1 Clone
Query for Type 2 Clone
Query for Type 3 Clone
Finding Various Code Snippets
Architecture of ccgrep
Evaluation
RQ1: Query Expressiveness
RQ2: Accuracy of ccgrep
RQ3: Performance of ccgrep
Related Works
Conclusions
FreeSoftwareFoundation
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call