Invalidating Analysis Knowledge for Code Virtualization Protection Through Partition Diversity

Wei Wang,Jie Ren,Fuwei Wang,Guixin Ye,Huanting Wang,Zheng Wang,Meng Li,Zhanyong Tang,Xiaoqing Gong,Dingyi Fang

doi:10.1109/access.2019.2954165

Abstract

To protect programs from unauthorized analysis, virtualize the code based on Virtual Machine (VM) technologies is emerging as a feasible method for accomplishing code obfuscation. However, in some State-of-the-art VM-based protection approaches, the set of virtual instructions and bytecode interpreters are fixed across the whole programs. This means an experienced attacker could extract the mapping information between virtual instructions and native code from programs, and use this knowledge to uncover the mapping relationships in similar protecting applications. To address this problem, we present CoDiver (Code Virtualization Protection with Diversity), a novel VM-based code obfuscation system in this paper. The main idea of our approach is to obfuscate the mapping between the opcodes of bytecode instructions and their semantics. To achieve this goal, we first turn every protected code region into multiple parts by partition proceeding, randomize the mapping of opcodes and their semantics of each part. By this way, we could translate the bytecode instruction into different native code in different sections of the obfuscated code. This method could increase the diversity of program behavior significantly. As a result, it will be useless to learn the mapping relationship between bytecode and native code of some other programs, then migrate it into a new program. We build a prototype of CoDiver and tested it on a set of real-world applications. Experimental results show that as compared with two state-of-the-art VM-based code obfuscation approaches, our approach is more effective and could provide stronger protection with comparable runtime overhead and code size.

Highlights

For software developers, unauthorized code reverse engineering is an important threat
The difficulty varies at different stages, and the most-consuming process is the first one that understanding the semantics of individual bytecode instructions
We introduce Instruction Set Randomization (ISR) [17] to change the opcodes of bytecode instructions and their semantics randomly

Summary

INTRODUCTION

For software developers, unauthorized code reverse engineering is an important threat. Wang et al.: Invalidating Analysis Knowledge for Code Virtualization Protection Through Partition Diversity translating the bytecode back into native machine instructions or even high-level program languages [12], [13] In these two steps, the difficulty varies at different stages, and the most-consuming process is the first one that understanding the semantics of individual bytecode instructions. Most of the studies focus on how to increase the diversity of program behavior by obfuscating the handler implementation [14] or using different interpretation techniques to transform a single program through multiple iterations [15], [16] These previous works use a static strategy to convert each native code into a fixed set of bytecode. The test results prove that as compared with two commercial code obfuscation tools, CoDiver could provides stronger protection with similar code size and runtime overhead

BACKGROUND

MOTIVATION

VIRTUAL INSTRUCTION SET AND HANDLERS

VIII. EFFECTIVENESS EVALUATION

RELATED WORK

CONCLUSION