Static Analysis (SA) is the practice of examining computer program source code for errors or vulnerabilities outside the compiler’s capabilities. To carry out Static Analysis on computer programs, various tools exist that parse and examine each line for known issues. These tools do not compile the program, nor do they run the program. Instead, they analyse the source code directly and infer properties about the program without executing it - thus “static” analysis. Static analysis is not new. Early UNIX contained a program called “lint” for static analysis; a tool that has now existed since the late 1970s (Johnson 1978). Increasingly, however, modern static analysis practices indicate a more focused intent; find cybersecurity weaknesses. At the 22nd European Conference on Cyber Warfare and Security, the authors presented the background of Intermediate Representations (IRs) and described how this “middleware” representation can be utilized for static analysis of source code for cybersecurity weaknesses. By examining an IR, potential flaws in the source code can be located. When utilizing the IR as opposed to the original high-level language, the static analysis process becomes independent of the original source language; if several languages such as C, Rust, and others all compile into the same IR, static analysis of the IR allows the analysis process to no longer be tied to the high-level language grammar or syntax. The previous paper implemented a literature survey of available IR analysis tools to discover prior work; the authors have subsequently advanced the research and are actively using an IR framework, LLVM, for vulnerability analysis. In this research, source code in a high-level language is first compiled to LLVM and the resulting IR is used for analysis. This approach uses a “code-to-IR” SA analysis preparation paradigm. At the same time, there is the potential for binary “lifters” to be used. These tools “lift” an executable program – binary machine instructions – back to LLVM. In this way, the paradigm can also be reversed such that static analysis of LLVM can be performed on source code compiled into LLVM, or on executable programs in the field that are “lifted”. This begs: how effective are these “lifters”? In this work, the authors present experiences in installation and operation of several binary “lifters” available as open-source projects. Some are supported better than others, some operate better than others, and some don’t operate at all. Those that do lead to the follow-up: Is the “lifted” code suitable for static analysis, or is it too obfuscated relative to the original program? This paper describes just that – our efforts and results in locating a binary “lifter” suitable for bringing executable program test cases back to LLVM for analysis by the cybersecurity vulnerability tool concurrently under development.
Read full abstract