Smart contract vulnerability detection based on semantic graph and residual graph convolutional networks with edge attention

Da Chen,Lin Feng,Yuqi Fan,Siyuan Shang,Zhenchun Wei

doi:10.1016/j.jss.2023.111705

Abstract

Smart contracts are becoming the forefront of blockchain technology, allowing the performance of credible transactions without third parties. However, smart contracts on blockchain are not immune to vulnerability exploitation and cannot be modified after being deployed on the blockchain. Therefore, it is imperative to assure the security of smart contracts via intelligent vulnerability detection tools with the exponential increase in the number of smart contracts. The remarkably developing deep learning technology provides a promising way to detect potential smart contract vulnerabilities. Nevertheless, existing deep learning-based approaches fail to effectively capture the rich syntax and semantic information embedded in smart contracts for vulnerability detection. In this paper, we tackle the problem of smart contract vulnerability detection at the function level by constructing a novel semantic graph (SG) for each function and learning the SGs using graph convolutional networks (GCNs) with residual blocks and edge attention. Our proposed method consists of three stages. In the first stage, we create the SG which contains rich syntax and semantic information including the data–data, instruction–instruction and instruction–data relationships, variables, operations, etc., by building an abstract syntax tree (AST) from the code of each function, removing the unimportant nodes in the AST, and adding edges between the nodes to represent the data flows and the execution sequence of the statements. In the second stage, we propose a new graph convolutional network model EA-RGCN to learn the content and semantic features of the code. EA-RGCN contains three parts: node and edge representation via word2vec, content feature extraction with a residual GCN (RGCN) module, and semantic feature extraction using an edge attention (EA) module. In the third stage, we concatenate the code content features and the semantic features to obtain the global code feature and use a classifier to identify whether the function is vulnerable. We conduct experiments on the datasets constructed from real-world smart contracts. Experimental results demonstrate that the proposed semantic graph and the EA-RGCN model can effectively improve the performance in terms of accuracy, precision, recall, and F1-score on smart contract vulnerability detection.

Full Text