Abstract

The rapid expansion of the open-source community has shortened the software development cycle, but the spread of vulnerabilities has been accelerated, especially in the field of the Internet of Things. In recent years, the frequency of attacks against connected devices is increasing exponentially; thus, the vulnerabilities are more serious in nature. The state-of-the-art firmware security inspection technologies, such as methods based on machine learning and graph theory, find similar applications depending on the known vulnerabilities but cannot do anything without detailed information about the vulnerabilities. Moreover, model training, which is necessary for the machine learning technologies, requires a significant amount of time and data, resulting in low efficiency and poor extensibility. Aiming at the above shortcomings, a high-efficiency similarity analysis approach for firmware code is proposed in this study. First, the function control flow features and data flow features are extracted from the functions of the firmware and of the vulnerabilities, and the features are used to calculate the SimHash of the functions. The mass storage and fast query capabilities of the SimHash are implemented by the pigeonhole principle. Second, the similarity function pairs are analyzed in detail within and among the basic blocks. Within the basic blocks, the symbolic execution is used to generate the basic block semantic information, and the constraint solver is used to determine the semantic equivalence. Among the basic blocks, the local control flow graphs are analyzed to obtain their similarity. Then, we implemented a prototype and present the evaluation. The evaluation results demonstrate that the proposed approach can implement large-scale firmware function similarity analysis. It can also get the location of the real-world firmware patch without vulnerability function information. Finally, we compare our method with existing methods. The comparison results demonstrate that our method is more efficient and accurate than the Gemini and StagedMethod. More than 90% of the firmware functions can be indexed within 0.1 s, while the search time of 100,000 firmware functions is less than 2 s.

Highlights

  • In recent years, the scale of the open-source community has expanded rapidly, promoting the capabilities of the software development industry

  • Code similarity analysis is a common technique for malicious code analysis that can be used in firmware security analysis

  • The input and output of the basic blocks are expressed as symbols, and the constraint solver is used to determine whether the semantics of the two basic blocks are equivalent The basic block transfer paths are analyzed to calculate the similarity of the local control flow graphs (LCFGs) of different functions

Read more

Summary

Introduction

The scale of the open-source community has expanded rapidly, promoting the capabilities of the software development industry. AVATAR [6] and AVATAR2 [12] are currently the most advanced platforms for firmware dynamic analysis; AVATAR runs the firmware alternately on physical devices and the QEMU emulator, and uses Selective Symbolic Execution (S2E) to perform symbol execution and taint analysis to explore security issues This method is too expensive to apply on a large scale. Implementing a high-efficiency similarity analysis approach for firmware code architecture analysis is high, including model training and retraining, generation of feature embedding, data preparation, among others, which leads to poor extensibility. Existing firmware vulnerability detection technologies play an important role in the field of IoT security, but these technologies share one or more the following shortcomings: incomplete features, high overhead, and poor extensibility. We design a basic block-level similarity analysis method, which identifies the location of a firmware patch without the need for vulnerability function information. The proposed approach does not need model training, unknown firmware can be analyzed directly

Firmware code similarity analysis
SimHash
Symbolic execution
3: Randomly select function f
SimHash function similarity
Basic block semantic similarity analysis
Implementation and evaluation
Dataset and evaluation criteria
Accuracy
Efficiency
Real-world case
The SimHash similarities between patched and unpatched httpd were compared
Discussion
Findings
New firmware code similarity analysis technology
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.