Abstract

By restoring the program into an easier understandable form, deobfuscation is an important technique for detecting and analyzing malicious software. To enable deobfuscation, one must know if the target program is obfuscated and what types of obfuscation schemes may be used. However, obtaining such information is challenging without having access to the original program source code.This paper presents a new way to estimate the obfuscation scheme of a compiled binary. It achieves this by using semantic information of the disassembled binary to predict if the program has been obfuscated and if so, what type of obfuscation scheme may be used. At the core of our approach is a set of deep neural networks that can effectively characterize and leverage the contextual information available in the assembly code. Our models are first trained offline, and the learned models can then be applied to new previously unseen obfuscated binaries. We evaluate our approach by applying it to a large dataset of over 277,000 obfuscated samples with different individual obfuscation schemes and their combinations. Experimental results show that our approach is highly effective in identifying the obfuscation scheme, with a prediction accuracy of at least 83% (up to 98%).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call