Abstract

Code clones are duplicated code snippets that significantly threaten software maintenance and the public corpora of code representation learning. Traditionally, code context and its structure information abstract syntax tree (AST), control flow graph (CFG) are typical representations of source code, and context-based models and structure-based models contributed significantly to the development of code clone detection. In this paper, we present a hybrid embedding model for code clone detection (HEM-CCD), a fusion method of token sequential information and graph-based structure information. We insert tokens’ global context information encoded by a bi-directional recurrent neural network into the AST-based graph for comprehensive code semantic representation. Then, feeding the graph into a gated graph neural network we generate code semantic vectors for similarity evaluation. We have implemented our model on two public clone datasets (BigCloneBench and GoogleCodeJam), and the results indicate that HEM-CCD outperforms several state-of-the-art approaches.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call