Abstract

In recent years, the number of smart contracts running in the blockchain has increased rapidly, accompanied by many security problems, such as vulnerability propagation caused by code reuse or vicious transaction caused by malicious contract deployment, for example. Most smart contracts do not publish the source code, but only the bytecode. Based on the research of bytecode similarity of smart contract, smart contract upgrade, vulnerability search and malicious contract analysis can be carried out. The difficulty of bytecode similarity research is that different compilation versions and optimization options lead to the diversification of bytecode of the same source code. This paper presents a solution, including a series of methods to measure the similarity of smart contract bytecode. Starting from the opcode of smart contract, a method of pre-training the basic block sequence of smart contract is proposed, which can embed the basic block vector. Positive samples were obtained by basic block marking, and the negative sampling method is improved. After these works, we put the obtained positive samples, negative samples and basic blocks themselves into the triplet network composed of transformers. Our solution can obtain evaluation results with an accuracy of 97.8%, so that the basic block sequence of optimized and unoptimized options can be transformed into each other. At the same time, the instructions are normalized, and the order of compiled version instructions is normalized. Experiments show that our solution can effectively reduce the bytecode difference caused by optimization options and compiler version, and improve the accuracy by 1.4% compared with the existing work. We provide a data set covering 64 currently used Solidity compilers, including one million basic block pairs extracted from them.

Highlights

  • Intelligent computing has become a part of our daily life, and advanced computing methods and technologies have become complex [1,2,3]

  • We considered four statistical types, namely true positive (TP), true negative (TN), false positive (FP) and false negative (FN)

  • In the opcode of a smart contract, unnormalized constants may include memory addresses, function signatures and transaction information, etc, which may cause out of vocabulary (OOV) problems; In the Ethereum Virtual Machine (EVM), instructions can be replaced by other instructions of the same category without changing the semantics in some cases

Read more

Summary

Introduction

Intelligent computing has become a part of our daily life, and advanced computing methods and technologies have become complex [1,2,3]. The explosive growth of intelligent computing data has brought some security problems. As a distributed technology that connects data blocks in an orderly manner, blockchain can help us overcome this challenge. It has the characteristics of decentralization, so as to reduce complexity, and can realize the openness and transparency of all data in the system, so as to improve the security of intelligent computing. The combination of intelligent computing and blockchain will provide strong support for the development of intelligent computing

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.