Abstract

Applying neural network technology to binary similarity detection has become a promising search topic, and vulnerability detection is an important application field of binary similarity detection. When embedding binary code into matrix by neural network, the problem of feature representation also needs to be solved in vulnerability detection. However, most of the current researches extract the syntax or structural features of binary code, and take basic block as the minimum analysis unit, which is relatively coarse. In addition, the structural features of binary functions are usually represented by the dependency graph. In the embedding process, only the neighbour information of the node can be obtained, ignoring the global information of the graph. To solve these two problems, we propose a two-channel feature extraction method to obtain semantic feature in finer granularity and represent the structural features globally instead of locally. Inspired by natural language process, we propose a contextual semantic feature extraction method to obtain different granularity features of binary functions. It takes instruction as the minimum analysis unit and obtains the semantic relationship between instructions. Meanwhile, in order to represent the structural feature of each function, we propose a neural GAE model instead of the widely used structure2vec model. In this way, we can preserve and reconstruct the control dependencies between the basic blocks in the whole graph. We have implemented a prototype system BEDetector, evaluated the effectiveness of its neural model and compared the accuracy of vulnerability function detection with state-of-the-art system. Besides, we choose the real-world firmware files as the detection target and prove that BEDetector can achieve a relatively high detection rate. BEDetector could reach a precision of 88.8%, 86.7% and 100% when ranking top-50 candidate functions in the detection of the CVE vulnerability function ssl3_get_key_exchange, ssl3_get_new_session_ticket and udhcp_get_option, proving the efficiency of our method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call