Abstract

Data obfuscation is usually used by malicious software to avoid detection and reverse analysis. When analyzing the malware, such obfuscations have to be removed to restore the program into an easier understandable form (deobfuscation). The deobfuscation based on program synthesis provides a good solution for treating the target program as a black box. Thus, deobfuscation becomes a problem of finding the shortest instruction sequence to synthesize a program with the same input-output behavior as the target program. Existing work has two limitations: assuming that obfuscated code snippets in the target program are known and using a stochastic search algorithm resulting in low efficiency. In this paper, we propose fine-grained obfuscation detection for locating obfuscated code snippets by machine learning. Besides, we also combine the program synthesis and a heuristic search algorithm of Nested Monte Carlo Search. We have applied a prototype implementation of our ideas to data obfuscation in different tools, including OLLVM and Tigress. Our experimental results suggest that this approach is highly effective in locating and deobfuscating the binaries with data obfuscation, with an accuracy of at least 90.34%. Compared with the state-of-the-art deobfuscation technique, our approach’s efficiency has increased by 75%, with the success rate increasing by 5%.

Highlights

  • Data obfuscation is the transformation that obscures the data structures used in the application [1]. e goal of this obfuscation technique consists of replacing standard binary operators with functionally equivalent but more complicated sequences of instructions

  • AutoSimpler contains three components: an obfuscation detector, a program synthesizer, and a search engine. It takes in a target program with obfuscated instructions and outputs a much easier program to understand. e obfuscation detector’s goal is to find the obfuscated code snippets in the target program through a trained machine learning model

  • (ii) Study 2: What is the performance of program synthesizer in AutoSimpler? Including its success rate and execution time, the most critical question is how to determine that the deobfuscated result is correct and simplified?

Read more

Summary

Introduction

Data obfuscation is the transformation that obscures the data structures used in the application [1]. e goal of this obfuscation technique consists of replacing standard binary operators (like addition, subtraction, or Boolean operators) with functionally equivalent but more complicated sequences of instructions. Deobfuscation is a transform that can remove obfuscation effects from the target program to solve the above problems It is a reverse process of code obfuscation. Us, the second research question raised is RQ2: how to optimize the search efficiency of program synthesis while improving the success rate? AutoSimpler relies on the advantages of the Nested Monte Carlo Search’s heuristics to improve program synthesis efficiency. We replace the stochastic search with a heuristic search in the deobfuscation framework of combining program synthesis and artificial intelligence to improve the efficiency of program synthesis while ensuring a slight increase in the success rate. E experimental results indicate that AutoSimpler is a deobfuscation tool for data obfuscated binaries with high accuracy of 90.34% and 23 seconds per task We implement a prototype of AutoSimpler and evaluate its accuracy and efficiency. e experimental results indicate that AutoSimpler is a deobfuscation tool for data obfuscated binaries with high accuracy of 90.34% and 23 seconds per task

Background
Methods
Overview of Our Approach
Implementation Details
Output of Candidate Program
Program Synthesizer
Experimental Setup
Evaluation Metrics
Experimental Results
Results
Related Work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call