Abstract

Code similarity measures create a comparison metric showing to what degree two code samples have the same functionality, e.g., to statically detect the use of known libraries in binary code. They are both an indispensable part of automated malware analysis, as well as a helper for the detection of plagiarism (IP protection) and the illegal use of open-source libraries in commercial apps. The centroid similarity metric extracts control-flow features from binary code and encodes them as geometric structures before comparing them. In our paper, we propose novel improvements to the centroid approach and apply it to the ARM architecture for the first time. We implement our approach as a plug-in for the IDA Pro disassembler and evaluate it regarding efficiency, accuracy and robustness on Android. Based on a dataset of 508,745 APKs, collected from 18 third-party app markets, we achieve a detection rate of 89% for the use of native code libraries, with an FPR of 10.8%. To test the robustness of our approach against the compiler version, optimization level, and other code transformations, we obfuscate and recompile known open-source libraries to evaluate which code transformations are resisted. Based on our results, we discuss how code re-use can be hidden by obfuscation and conclude with possible improvements.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.