Abstract

With the popularity of IoT (Internet of Things) devices, the security risks of these devices are increasing. However, due to the multisource heterogeneity of IoT devices, there are significant differences between the vulnerability detection of the Internet of Things and the PC-based vulnerability search method. Therefore, determining how to accurate search for vulnerabilities in large-scale cross-platform binary executable files is an urgent problem to be solved. At present, the solution to this problem mostly calculates code similarities by generating a CFG (control flow graph) from binary code, but due to the choice of architecture, OS (operating system) or compilation options, the same source code will be compiled into different assembly codes. The performance of existing vulnerability search methods for cross-architecture binaries has been challenged. To alleviate the vast differences in the assembly codes caused by different compilation scenarios, this paper proposes a cross-platform large-scale binary vulnerability search method based on two-level feature semantic learning. The contribution is that we have defined a new functional structured signature method to mitigate the massive grammatical and structural differences of binary files caused by different compilation environments. Moreover, we reasonably integrate the hierarchical model of Structure2Vec and GAT (graph attention network) and implement training from the internal control flow characteristics of the function and the call relationship between functions to obtain a more accurate functional semantic expression.

Highlights

  • Using open source code or using third-party libraries is a common approach in the development process, and the same vendor often reuses code, which provides fertile ground for the generation and survival of vulnerabilities

  • OVERVIEW We propose a vulnerability search method based on hierarchical semantic learning [44] and implement a prototype for verification experiments

  • Most of the existing graph-based function similarity calculation methods extract features directly from the function CFG, PDG(program dependency graph) [8], AST(abstract syntax tree) [9], etc., but by using different choices of architecture, OS or compilation options, the same source code may be compiled into assembly code with different structures, and the function features extracted by these methods cannot accurately express the function semantics [35]

Read more

Summary

INTRODUCTION

Using open source code or using third-party libraries is a common approach in the development process, and the same vendor often reuses code, which provides fertile ground for the generation and survival of vulnerabilities. D. CONTRIBUTIONS In summary, our main contributions are the following: 1) Guided by manual vulnerability search, we propose a solution to reduce the impact of different compilation environments on function binaries; 2) We attach the function data flow to CFG and designed a model-oriented GA to select suitable features to obtain more complete semantics; 3) We apply a artificial neural network GAT to construct a network architecture based on the attention mechanism of neighbor nodes, and consider the call relationships between functions and generate richer semantic representations; 4) We establish a hierarchical model to fused the GAT [5] model and the Structure2Vec [6] model, and train them together from the intra-function characteristic and the call relationship between functions to achieve a more accurate functional similarity comparison; 5) We implemented a prototype called BiN. Our evaluation shows that BiN can achieve higher AUC than other stateof-the-art graphics-based matching methods in the test set built by OpenSSL and BusyBox; 6) We tested our prototypes on a larger data set, and the results showed that our method implementation was accurate and efficient enough to handle real-world vulnerability detection efforts

BACKGROUND
SEMANTIC LEARNING PREDICTOR
INTRA-FUNCTION FEATURE LEARNING MODEL
EVALUATION
ACCURACY OF VULNERABILITY SEARCH
RELATED WORK
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call