Abstract

The mathematical foundations of abstract interpretation provide a unified method of formalization and research of program analysis algorithms for a broad spectrum of practical problems. However, its practical usage for binary code analysis faces several challenges, of both scientific and engineering nature. In this paper we address some of those challenges. We describe an intermediate representation that is tailored to binary code analysis; unlike some other IRs it is still useable in system code analysis. To achieve this, we take into account the low-level specifics of how CPUs work; on the IR level this mostly pertains to modeling main memory in that accesses can fail, and addresses can alias. Further, we propose an infrastructure for carrying out abstract interpretation on top of the IR. The user needs to implement the abstract state and the transfer functions, and the infrastructure handles the rest: two executors are currently implemented, one for analysis of a single path, and one for fixed point analysis. Both executors handle interprocedural analysis internally, via inlining or using summaries, so the interpretations only consider only procedure at a time, which greatly simplifies implementation. The IR and the abstract interpretation framework are used together to define a model pipeline for a target instruction set architecture, consisting of a fetch stage, a decode stage, and an execute stage. A distinct fetch stage allows to model delay slots, hardware loops, etc. We currently have limited implementations for RISC-V and x86. The x86 implementation is evaluated in two experiments where concolic execution is used to automatically analyze a «crackme» program, both in dynamic (execution trace) and static (executable image) setting. In conclusion, we outline the future directions of our project.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call