Abstract

Binary code search is a technique that involves finding code with similarity to a given code within a code database. It finds extensive application in scenarios such as vulnerability queries and code defect analysis. While many existing methods employ advanced machine learning models for similarity analysis, their lack of interpretability and low efficiency in dealing with large-scale functions still remain challenges. To address these issues, we propose a high-efficiency binary code search method called HEBCS. It employs an interpretable approach to extract function-level features and transforms each feature into a locality-sensitive hash representation. Then, the hashes of these features are combined to form the hash of the function. By leveraging the pigeonhole principle, HEBCS enables efficient storage and retrieval of functions, ensuring high execution efficiency even in the presence of large-scale data. Furthermore, we compare HEBCS with a classic method and a state-of-the-art method, demonstrating that HEBCS achieves significantly higher search efficiency while maintaining a comparable accuracy, recall and F1-score. In real-world vulnerability query applications, HEBCS demonstrated promising results. Its effectiveness in large-scale binary function searches suggests significant potential for practical applications.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call