Abstract

In this study, we present a novel representation for binary programs which captures semantic similarity and structural properties. This representation enables the search and retrieval of binary executable programs based on their similarity of behavioral properties. The proposed representation is composed in a bottom-up approach: we begin by extracting data dependency graphs (DDG), which are representative of both program structure and operational semantics. We then encode each program as a set of graph hashes representing isomorphic uniqueness, a method we have labeled DDG Fingerprinting. We present experimental results of search using k-Nearest Neighbors in a metric space constructed from a set of binary executables. Searches in the dataset are based on the operational semantics of specific malware examples By quantifying behavioral similarity we show that we can recognize patterns of operation in novel malware with functionality not previously identified. We show in addition that the associated metric space allows an adjustable level of resolution. Resolution of the features may be decreased for breadth of search and retrieval, or as the search space is reduced, the resolution may be increased for accuracy and fine-grained analysis of malware behavior. This allows for explainability in the interpretation of fine-grained analysis.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call