IFAttn: Binary code similarity analysis based on interpretable features with attention

Shuai Jiang,Cai Fu,Yekui Qian,Shuai He,Jianqiang Lv,Lansheng Han

doi:10.1016/j.cose.2022.102804

Abstract

Binary code similarity analysis (BCSA11BCSA: Binary Code Similarity Analysis.) is meaningful in various software security applications, including vulnerability discovery, clone detection, and malware detection. Although many BCSA studies have been based on neural networks in recent years, some significant problems are challenging to solve. First, most existing methods focus more on the function pair similarity detection task (FPSDT22FPSDT: Function Pair Similarity Detection Task.) while ignoring the function search task (FST33FST: Function Search Task.), which is more major in vulnerability discovery. Moreover, they care more about the final result, which is to improve the success rate of FPSDT by using unexplainable neural networks. Finally, in practice, most methods are difficult to resist cross-optimization and cross-obfuscation in BCSA. We first proposed an adaptive BCSA architecture combining interpretable feature engineering and learnable attention mechanism to solve these problems. We design an adaptive model with rich interpretable features, and the experimental results on FPSDT and FST are better than the state-of-the-art methods. In addition, we also found that the attention mechanism has outstanding advantages in functional semantic expression. Finally, the evaluation shows that our approach can significantly improve FST performance between cross-architecture, cross-optimization, cross-obfuscation and cross-compiler binaries.

Full Text