Abstract

Binary lifters convert executables into an intermediate representation (IR) of a compiler framework. The recovered IR code is generally deemed “analysis friendly,” bridging low-level code analysis with well-established compiler infrastructures. With years of development, binary lifters are becoming increasingly popular for use in various security, systems, and software (re)-engineering applications. Recent studies have also reported highly promising results that suggest binary lifters can generate LLVM IR code with correct functionality, even for complex cases.This paper conducts an in-depth study of binary lifters from an orthogonal and highly demanding perspective. We demystify the “expressiveness” of binary lifters, and reveal how well the lifted LLVM IR code can support critical downstream applications in security analysis scenarios. To do so, we generate two pieces of LLVM IR code by compiling C/C++ programs or by lifting the corresponding executables. We then feed these two pieces of LLVM IR code to three keystone downstream applications (pointer analysis, discriminability analysis, and decompilation) and determine whether inconsistent analysis results are generated. We study four popular static and dynamic LLVM IR lifters that were developed by the industry or academia from a total of 252,063 executables generated by various compilers and optimizations and on different architectures. Our findings show that modern binary lifters afford IR code that is highly suitable for discriminability analysis and decompilation, and suggest that such binary lifters can be applied in common similarity- or code comprehension-based security analysis (e.g., binary diffing). However, the lifted IR code appears unsuited to rigorous static analysis (e.g., pointer analysis). To obtain a more comprehensive view of the utility of binary lifters, we also compare the performance of lifter-enabled approaches with that of binary-only tools in three security tasks, i.e., sanitization, binary diffing, and C decompilation. We summarize our findings and make suggestions for the correct use and further enhancement of binary lifters. We also explored practical ways to enhance the accuracy of pointer analysis using lifted IR code, by using and augmenting Debin, a tool for predicting debug information.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call