Context. With the development of blockchain technology and the increasing use of smart contracts, which are automatically executed in blockchain networks, the significance of securing these contracts has become extremely relevant. Traditional code auditing methods often prove ineffective in identifying complex vulnerabilities, which can lead to significant financial losses. For example, the reentrancy vulnerability that led to the DAO attack in 2016 resulted in the loss of 3.6 million ethers and the split of the Ethereum blockchain network. This underscores the necessity for early detection of vulnerabilities. Objective. The objective of this work is to develop and test an innovative approach for identifying and localizing vulnerabilities in smart contracts based on the analysis of attention vectors in a model using BERT architecture. Method. The methodology described includes data preparation and training a transformer-based model for analyzing smart contract code. The proposed attention vector analysis method allows for the precise identification of vulnerable code segments. The use of the CodeBERT model significantly improves the accuracy of vulnerability identification compared to traditional methods. Specifically, three types of vulnerabilities are considered: reentrancy, timestamp dependence, and tx.origin vulnerability. The data is preprocessed, which includes the standardization of variables and the simplification of functions. Results. The developed model demonstrated a high F-score of 95.51%, which significantly exceeds the results of contemporary approaches, such as the BGRU-ATT model with an F-score of 91.41%. The accuracy of the method in the task of localizing reentrancy vulnerabilities was 82%. Conclusions. The experiments conducted confirmed the effectiveness of the proposed solution. Prospects for further research include the integration of more advanced deep learning models, such as GPT-4 or T5, to improve the accuracy and reliability of vulnerability detection, as well as expanding the dataset to cover other smart contract languages, such as Vyper or LLL, to enhance the applicability and efficiency of the model across various blockchain platforms. Thus, the developed CodeBERT-based model demonstrates high results in detecting and localizing vulnerabilities in smart contracts, which opens new opportunities for research in the field of blockchain platform security.
Read full abstract