BinCodex: A comprehensive and multi-level dataset for evaluating binary code similarity detection techniques

Peihua Zhang,Chenggang Wu,Zhe Wang

doi:10.1016/j.tbench.2024.100163

Peihua Zhang, Chenggang Wu + Show 1 more

Open Access

https://doi.org/10.1016/j.tbench.2024.100163

Copy DOI

Abstract

The binary code similarity detection (BCSD) technique can quantitatively measure the differences between two given binaries and give matching results at predefined granularity (e.g., function), and has been widely used in multiple scenarios including software vulnerability search, security patch analysis, malware detection, code clone detection, etc. With the help of deep learning, the BCSD techniques have achieved high accuracy in their evaluation. However, on the one hand, their high accuracy has become indistinguishable due to the lack of a standard dataset, thus being unable to reveal their abilities. On the other hand, since binary code can be easily changed, it is essential to gain a holistic understanding of the underlying transformations including default optimization options, non-default optimization options, and commonly used code obfuscations, thus assessing their impact on the accuracy and adaptability of the BCSD technique. This paper presents our observations regarding the diversity of BCSD datasets and proposes a comprehensive dataset for the BCSD technique. We employ and present detailed evaluation results of various BCSD works, applying different classifications for different types of BCSD tasks, including pure function pairing and vulnerable code detection. Our results show that most BCSD works are capable of adopting default compiler options but are unsatisfactory when facing non-default compiler options and code obfuscation. We take a layered perspective on the BCSD task and point to opportunities for future optimizations in the technologies we consider.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

BinCodex: A comprehensive and multi-level dataset for evaluating binary code similarity detection techniques

Abstract

Talk to us

Similar Papers

More From: BenchCouncil Transactions on Benchmarks, Standards and Evaluations

Lead the way for us

Journal: BenchCouncil Transactions on Benchmarks, Standards and Evaluations	Publication Date: May 21, 2024
License type: cc-by-nc-nd

Similar Papers

An Inclusive Report on Robust Malware Detection and Analysis for Cross-Version Binary Code Optimizations
S Poornima, R Mahalakshmi
International Journal on Recent and Innovation Trends in Computing and Communication | VOL. 11
S Poornima, R MahalakshmiS Poornima, R Mahalakshmi
30 Oct 2023
International Journal on Recent and Innovation Trends in Computing and Communication | VOL. 11

Binary Code Representation With Well-Balanced Instruction Normalization
Hyungjoon Koo ... Taesoo Kim
IEEE Access | VOL. 11
Hyungjoon Koo, et. al.Hyungjoon Koo ... Taesoo Kim
01 Jan 2023
IEEE Access | VOL. 11

Research and implementation of obfuscation binary code similarity detection
Yang Zhang ... Can Cui
-
Yang Zhang, et. al.Yang Zhang ... Can Cui
09 Dec 2022
09 Dec 2022

αDiff: cross-version binary code similarity detection with DNN
Bingchang Liu ... Feng Li
-
Bingchang Liu, et. al.Bingchang Liu ... Feng Li
03 Sep 2018
03 Sep 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

BinCodex: A comprehensive and multi-level dataset for evaluating binary code similarity detection techniques

Abstract

Talk to us

Similar Papers

More From: BenchCouncil Transactions on Benchmarks, Standards and Evaluations