Abstract

The application of approximate matching (a.k.a. fuzzy hashing or similarity hashing) is often considered in the field of malware or binary analysis. Recent research showed major weaknesses of predominant fuzzy hashing techniques in the case of measuring the similarity of executables (Pagani et al., 2018). Summarized, well known Context-Triggered Piecewise-Hashing approaches are not very reliable for the task of binary comparisons, as even benign changes heavily impact the underlying byte representation of an original binary. Modifications could be caused by benign or malicious source code changes, different compilers, and changed compiler settings. Approaches based on the extraction of statistically improbable features (Roussev, 2010) or n-gram histograms (Oliver et al., 2013) showed a better detection performance in case of inexactly matching binaries with varying build settings or source code modifications. However, the inexact matching of binaries lacks the ability to give more exact inferences, i.e., the ability to highlight offsets changed on a byte-level or slight variations within a modified binary. In this work we present apx-bin: an approximate matching implementation for the task of binary analysis and binary matching. Our approach unites exact and inexact matching capabilities. A first comparison of our approach against four different fuzzy hashing techniques showed major advantages in nearly all of the mentioned scenarios. Previous research underlines the volatile nature of schemes in different scenarios. In contrast, apx-bin is more robust and shows stable results across all considered scenarios. Our scheme, based on a code- and data-related feature extraction, can be further utilized as independent digest or integrated into existing schemes.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.