PBDiff: Neural network based program-wide diffing method for binaries.

Lu Yu,Yuliang Lu,Jiazhen Zhao,Yi Shen,Jun Zhao

doi:10.3934/mbe.2022127

Abstract

Program-wide binary code diffing is widely used in the binary analysis field, such as vulnerability detection. Mature tools, including BinDiff and TurboDiff, make program-wide diffing using rigorous comparison basis that varies across versions, optimization levels and architectures, leading to a relatively inaccurate comparison result. In this paper, we propose a program-wide binary diffing method based on neural network model that can make diffing across versions, optimization levels and architectures. We analyze the target comparison files in four different granularities, and implement the diffing by both top down process and bottom up process according to the granularities. The top down process aims to narrow the comparison scope, selecting the candidate functions that are likely to be similar according to the call relationship. Neural network model is applied in the bottom up process to vectorize the semantic features of candidate functions into matrices, and calculate the similarity score to obtain the corresponding relationship between functions to be compared. The bottom up process improves the comparison accuracy, while the top down process guarantees efficiency. We have implemented a prototype PBDiff and verified its better performance compared with state-of-the-art BinDiff, Asm2vec and TurboDiff. The effectiveness of PBDiff is further illustrated through the case study of diffing and vulnerability detection in real-world firmware files.

Full Text