Abstract

Binary code analysis is vital in source code unavailable cases, such as malware analysis and software vulnerability mining. Its first step could be function identification. Most function identification methods are based on function prologs/epilogs. However, functions may not have standard prologs/epilogs. To identify these functions, we need to use other methods. One approach is to identify return instructions first and then identify the start of a function. Currently, the multi-layer perceptron model is exploited to identify and validate a return instruction at a specific location. On this basis, a new approach is proposed to improve accuracy and provide more details. Specifically, a return instruction is classified into three classes: (1) false return instruction, (2) true return instruction inner a function but not the last instruction, and (3) true return instruction at the end of a function. The evaluation is performed on 5782 real-world binaries. Meanwhile, common classifiers including fully connected neural network, Two-layer Bidirectional Recurrent Neural Network (TBRNN), Two-layer Bidirectional Gate Recurrent Unit (TBGRU), Two-layer Bidirectional Long Short-term Memory Network (TBLSTM), Decision Tree, Random Forest, XGBoost, and Support Vector Machine (SVM) are evaluated on the same data set. The result shows that TBLSTM achieves an accuracy of 99.78%, which is higher than that of other classifiers in the evaluation, including the state-of-the-art tool IDA Pro 7.7.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.