Abstract

Binary type inference is a challenging problem due partly to the fact that during the compilation much type-related information has been lost. Most existing research work resorts to program analysis techniques, which can be either too heavyweight to be viable in practice or too conservative to be able to infer types with high accuracy. In this paper, we propose a new approach to learning types for binary code. Motivated by “duck typing,” our approach learn types for recovered variables from their features and properties (e.g., related representative instructions). We first use machine learning to train a classifier with basic types as its levels from binaries with debugging information. The classifier is then used to learn types for new and unseen binaries. While for composite types, such as ${pointer}$ and ${struct}$ , a points-to analysis is performed. Finally, several experiments are conducted to evaluate our approach. The results demonstrate that our approach is more precise, both in terms of correct types and compatible types, than the commercial tool Hex-Rays, the open source tool Snowman, and a recent tool EKLAVYA using machine learning. We also show that the type information our proposed system learns is capable of helping detect malware.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call