Abstract
Code clones are duplicated code snippets that significantly threaten software maintenance and the public corpora of code representation learning. Traditionally, code context and its structure information abstract syntax tree (AST), control flow graph (CFG) are typical representations of source code, and context-based models and structure-based models contributed significantly to the development of code clone detection. In this paper, we present a hybrid embedding model for code clone detection (HEM-CCD), a fusion method of token sequential information and graph-based structure information. We insert tokens’ global context information encoded by a bi-directional recurrent neural network into the AST-based graph for comprehensive code semantic representation. Then, feeding the graph into a gated graph neural network we generate code semantic vectors for similarity evaluation. We have implemented our model on two public clone datasets (BigCloneBench and GoogleCodeJam), and the results indicate that HEM-CCD outperforms several state-of-the-art approaches.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: International Journal of Software Engineering and Knowledge Engineering
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.