Abstract

API functions often require the crafting of specific inputs and may return some output that is usually processed by the code that immediately follows their invocation. In this work, we claim that - for some APIs - those two stages are both frequently similar across different binaries and sufficiently unique to be fingerprinted.We build upon this intuition and present Apícula, a static analysis tool for identifying API calls in generic streams of bytes, such as memory dumps, network traffic, or object code files. In a nutshell, Apícula leverages the control flow graph of a binary to generate a set of fingerprints for all basic blocks that end with a call instruction. Those sets are then compared against a database of pre-computed fingerprints to establish whether any known API is being invoked. Due to its applicability to unstructured byte streams, Apícula can complement the reverse engineering process when this is carried out over memory dumps collected after a cyber-incident. Moreover, it can enable behavioral analysis in a fully static way, by identifying sequences of API calls even in non executable binaries.We provide a series of experiments that are instrumental (1) in demonstrating that the same fingerprints computed for specific APIs can be observed across different binaries and (2) in identifying a subset of the Windows APIs whose usage can be detected by Apícula with sufficient precision and sensitivity, focusing in particular on malicious binaries. Furthermore, we illustrate two techniques that can be used to validate different fingerprint databases in case someone wants to detect APIs belonging to libraries different from those that we consider in this work.In particular, we prove that fingerprints associated with different APIs are remarkably dissimilar and therefore can be employed for distinguishing between APIs. More specifically, we find that fingerprint sets associated with different APIs present on average a Jaccard index value of 0.000125; in comparison, the average similarity between fingerprint sets associated with the same API is 0.29 (Jaccard index) for binaries compiled with the same optimization level and 0.07 (Jaccard index) for binaries compiled with different optimization levels. Moreover, we show that we can build databases of fingerprints that are sufficiently comprehensive to identify specific APIs in unseen binaries. More precisely, we identify 228 different APIs among the Windows APIs (including the C run-time libraries) whose usage can be detected by Apículawith sensitivity greater than 80% and a false discovery rate lower than 5%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call