Semantics-aware obfuscation scheme prediction for binary

Yujie Zhao,Zhanyong Tang,Guixin Ye,Dongxu Peng,Dingyi Fang,Xiaojiang Chen,Zheng Wang

doi:10.1016/j.cose.2020.102072

Yujie Zhao, Zhanyong Tang + Show 5 more

Open Access

https://doi.org/10.1016/j.cose.2020.102072

Copy DOI

Journal: Computers & Security	Publication Date: Oct 3, 2020
Citations: 6	License type: cc-by-nc-nd

Affiliation: Jingdong (China), University of Leeds

Abstract

By restoring the program into an easier understandable form, deobfuscation is an important technique for detecting and analyzing malicious software. To enable deobfuscation, one must know if the target program is obfuscated and what types of obfuscation schemes may be used. However, obtaining such information is challenging without having access to the original program source code.This paper presents a new way to estimate the obfuscation scheme of a compiled binary. It achieves this by using semantic information of the disassembled binary to predict if the program has been obfuscated and if so, what type of obfuscation scheme may be used. At the core of our approach is a set of deep neural networks that can effectively characterize and leverage the contextual information available in the assembly code. Our models are first trained offline, and the learned models can then be applied to new previously unseen obfuscated binaries. We evaluate our approach by applying it to a large dataset of over 277,000 obfuscated samples with different individual obfuscation schemes and their combinations. Experimental results show that our approach is highly effective in identifying the obfuscation scheme, with a prediction accuracy of at least 83% (up to 98%).

Full Text