Abstract
Binary code search is a technique that involves finding code with similarity to a given code within a code database. It finds extensive application in scenarios such as vulnerability queries and code defect analysis. While many existing methods employ advanced machine learning models for similarity analysis, their lack of interpretability and low efficiency in dealing with large-scale functions still remain challenges. To address these issues, we propose a high-efficiency binary code search method called HEBCS. It employs an interpretable approach to extract function-level features and transforms each feature into a locality-sensitive hash representation. Then, the hashes of these features are combined to form the hash of the function. By leveraging the pigeonhole principle, HEBCS enables efficient storage and retrieval of functions, ensuring high execution efficiency even in the presence of large-scale data. Furthermore, we compare HEBCS with a classic method and a state-of-the-art method, demonstrating that HEBCS achieves significantly higher search efficiency while maintaining a comparable accuracy, recall and F1-score. In real-world vulnerability query applications, HEBCS demonstrated promising results. Its effectiveness in large-scale binary function searches suggests significant potential for practical applications.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.