Abstract

Cross-architecture binary code similarity detection technology has been widely used in vulnerability discovery, reverse engineering and patch detection. The identification of binary file compilation information is conducive to the improvement of the accuracy in binary code similarity detection. The compilation information of binary files includes compilation architecture, compiler, optimization option and obfuscation strategy. For the compilation architecture, we build a compiling architecture feature library based on the ELF header information of the binary file for identification; for the compiler, we use Linux system commands to identify; for the optimization option and obfuscation strategy, we extract 70 static features of binary file function-level assembly language and establish a genetic neural network model for identification. In addition, we set up five experimental tasks to learn more about the compilation architecture and compiler impact on model identification optimization options and obfuscation strategies. The final experimental results show that the accuracy of the binary file compilation information identification model designed by us is 100% for both compilation architectures and compilers identification, and the F-Score for optimization options identification can reach 89.46%. The F-Score for obfuscation strategies identification can reach 88.74%, and the F-Score for simultaneous identification of optimization options and obfuscation strategies can reach 84.28%, which is significantly better than previous works.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call