Abstract
Binary code homology analysis refers to detecting whether two pieces of binary code are compiled from the same piece of source code, which is a fundamental technique for many security applications, such as vulnerability search, plagiarism detection, and malware detection. With the increase in critical vulnerabilities in IoT devices, homology analysis is increasingly needed to perform cross-platform vulnerability searches. Existing methods for cross-platform binary code homology detection usually convert binary code to instruction sequences and do semantic embedding of the sequences as if they were natural language. However, the gap between natural language and binary code is large, and the spatial features of the binary code are easily lost by directly comparing the semantics. In this paper, we propose a GRU-based graph embedding method to compare the homology of binary functions. First, the attribute control flow graph (ACFG) is built for the assembly function, then the GRU-based graph embedding neural network is used to generate the embedding vector for the ACFG, and finally the homology of the binary code is determined by calculating the distance between the embedding vectors. The experimental results show that our method greatly improves the detection accuracy of negative samples compared with Gemini, the latest method based on graph embedding binary code similarity detection.
Highlights
With the rise and development of the Internet of ings technology, more and more embedded devices carry out network communications, and some of the security issues that exist among them have become increasingly prominent
Evaluation Indicators. e main objective of this paper is to detect if two binary codes from different platforms are homologous, so deep learning is used in this topic to solve the binary classification problem. e common evaluation metrics used in the deep learning application problem of binary classification are accuracy, true negative rate, recall, AUC, and so on
(i) True Positive (TP) means that positive cases will be predicted as positive classes (ii) False Positive (FP) means that negative cases will be predicted as positive classes (iii) True Negative (TN) means that negative cases will be predicted as negative classes (iv) False Negative (FN) means that positive cases will be predicted as negative classes
Summary
With the rise and development of the Internet of ings technology, more and more embedded devices carry out network communications, and some of the security issues that exist among them have become increasingly prominent. Erefore, it is urgent to find a reasonable vulnerability analysis method for embedded device firmware to effectively detect the homology of similar code. Current binary code homology analysis mainly uses dynamic tracing or static analysis to obtain feature information, such as instruction sequences [1], API call sequences [2], or graph structure features [3]. Sequence information is easier to obtain than graph structure information, so most researchers conduct research on the basis of instruction sequence or API sequence, treat the sequence as a natural language, and use semantic embedding methods to obtain semantic features. Compared with graph structure information, semantic features usually have larger dimensions leading to lower detection efficiency and lose spatial features in binary code execution, such as function call relationships and basic block call relationships, which are similar when cross-platform. Xu et al [5] proposed a neural network-based approach, Gemini, which shows great
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.