Abstract

Decompilation aims to analyze and transform low-level program language (PL) codes such as binary code or assembly code to obtain an equivalent high-level PL. Decompilation plays a vital role in the cyberspace security fields such as software vulnerability discovery and analysis, malicious code detection and analysis, and software engineering fields such as source code analysis, optimization, and cross-language cross-operating system migration. Unfortunately, the existing decompilers mainly rely on experts to write rules, which leads to bottlenecks such as low scalability, development difficulties, and long cycles. The generated high-level PL codes often violate the code writing specifications. Further, their readability is still relatively low. The problems mentioned above hinder the efficiency of advanced applications (e.g., vulnerability discovery) based on decompiled high-level PL codes.In this paper, we propose a decompilation approach based on the attention-based neural machine translation (NMT) mechanism, which converts low-level PL into high-level PL while acquiring legibility and keeping functionally similar. To compensate for the information asymmetry between the low-level and high-level PL, a translation method based on basic operations of low-level PL is designed. This method improves the generalization of the NMT model and captures the translation rules between PLs more accurately and efficiently. Besides, we implement a neural decompilation framework called Neutron. The evaluation of two practical applications shows that Neutron’s average program accuracy is 96.96%, which is better than the traditional NMT model.

Highlights

  • Decompilation aims to convert compiled low-level program language (PL), such as executable programs or assembly code, in intermediate representation into functionally equivalent highlevel PL, which is friendly to read

  • Conventional Decompilation Conventional decompilation mainly depends on computer scientists to define decompilation rules through control flow analysis, to realize the conversion of a lowlevel PL into intermediate language or high-level language representation that is more convenient for humans read (Durfina et al 2011; Durfina et al 2013; Yakdan et al 2016; Yakdan et al 2015; Brumley et al 2013)

  • We introduce the attention-based neural machine translation (NMT) model (Luong et al 2015) as the decompilation model, whose architecture is shown in the Fig. 4

Read more

Summary

Introduction

Decompilation aims to convert compiled low-level PL, such as executable programs or assembly code, in intermediate representation into functionally equivalent highlevel PL, which is friendly to read. The current representative decompilers mainly include Phoenix (Brumley et al 2013; Hex-Rays 2020), RetDec (Kroustek et al 2017), and Ghidra (2020) Both Hex-Rays and Phoenix rely on pattern matching to identify the program’s advanced control flow structure and change the control flow graph (CFG) of the program. It is semantically equivalent to the original low-level PL code, it is difficult to read and relatively inefficient In response to this problem, scientists have targeted goto-free for research, such as DREAM++ (Yakdan et al 2016; Yakdan et al 2015), which can restore all control structures in binary programs and generate structured decompiled codes without any goto statements

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call