Abstract

AbstractCode clones can be defined as two identical pieces of code having the same or similar functionality. Code clone detection is critical to improve and sustain code quality. Current methods are unable to extract semantic and syntactic features and classify code bases satisfactorily. We propose a novel two-stage machine-learning approach towards code clone detection. Firstly, multiple intermediate representations of source code are extracted and combined to generate a holistic embedding based on a recently proposed technique. Next, we use these embeddings to train an Intermediate Merge Siamese Neural Network to detect functional code clones. Siamese Neural Networks are a state-of-the-art machine learning architecture particularly suited to code clone detection. This novel combination allows for learning subtle syntactic and semantic features and identifying previously undetectable similarities. Our solution shows a significant improvement in code clone detection, as shown by experimental evaluation over the OJClone C++ dataset.KeywordsFunctional code clonesAbstract Syntax Tree (AST)Control Flow Graph (CFG)Deep learningSiamese Neural Network

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call