Abstract

This research presents a parallel and distributed data mining approach to code clone detection. It aims to prove the value and importance of deploying parallel and distributed computing for real-time large scale code clone detection. It is implemented this approach in a family of clone detectors, called PD EgyCD (Parallel and Distributed Egypt Clone Detector). In this approach, This research builds on an earlier work of the authors for code clone and plagiarism detection using sequential pattern mining by adding parallelism and distribution to our earlier tool EgyCD. Our approach uses data mining through a tailored Apriori-based algorithm for code clone detection. And it uses parallelization and distribution to achieve excellent performance to scale up to clone detection on very large systems. This approach has been implemented as a database application which leverages the capabilities of modern database tools. Two versions have been developed of this distributed technique. The first one uses client-server technique in which all clients and the server deal with only one database. The second one uses agents where each client acts as a separate agent and has its own database and after working on a sub-problem, it submits its partial solution to the server to finally get the complete solution (set of code clones). Experiments show that agents technique is faster than clientserver one. Distribution enhances performance very much. Speed improvement is a function of the number of clients/agents used. Our conclusion is that data mining, combined with parallel and distributed computing, can efficiently be deployed for code clone detection of very large systems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call