Abstract

Intelligent deep learning-based models have made significant progress for automated source code semantics embedding, and current research works mainly leverage natural language-based methods and graph-based methods. However, natural language-based methods do not capture the rich semantic structural information of source code, and graph-based methods do not utilize rich distant information of source code due to the high cost of message-passing steps. In this paper, we propose a novel interpretable model, called graph tensor convolution neural network (GTCN), to generate accurate code embedding, which is capable of comprehensively capturing the distant information of code sequences and rich code semantics structural information. Firstly, we propose to utilize a high-dimensional tensor to integrate various heterogeneous code graphs with node sequence features, such as control flow, data flow. Secondly, inspired by the current advantages of graph-based deep learning and efficient tensor computations, we propose a novel interpretable graph tensor convolution neural network for learning accurate code semantic embedding from the code graph tensor. Finally, we evaluate three popular applications on the GTCN model, variable misuse detection, source code prediction, and vulnerability detection. Compared with current state-of-the-art methods, our model achieves higher scores with respect to the top-1 accuracy while costing less training time.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call