Abstract

Security vulnerabilities play a vital role in network security system. Fuzzing technology is widely used as a vulnerability discovery technology to reduce damage in advance. However, traditional fuzz testing faces many challenges, such as how to mutate input seed files, how to increase code coverage, and how to bypass the format verification effectively. Therefore machine learning techniques have been introduced as a new method into fuzz testing to alleviate these challenges. This paper reviews the research progress of using machine learning techniques for fuzz testing in recent years, analyzes how machine learning improves the fuzzing process and results, and sheds light on future work in fuzzing. Firstly, this paper discusses the reasons why machine learning techniques can be used for fuzzing scenarios and identifies five different stages in which machine learning has been used. Then this paper systematically studies machine learning-based fuzzing models from five dimensions of selection of machine learning algorithms, pre-processing methods, datasets, evaluation metrics, and hyperparameters setting. Secondly, this paper assesses the performance of the machine learning techniques in existing research for fuzz testing. The results of the evaluation prove that machine learning techniques have an acceptable capability of prediction for fuzzing. Finally, the capability of discovering vulnerabilities both traditional fuzzers and machine learning-based fuzzers is analyzed. The results depict that the introduction of machine learning techniques can improve the performance of fuzzing. We hope to provide researchers with a systematic and more in-depth understanding of fuzzing based on machine learning techniques and provide some references for this field through analysis and summarization of multiple dimensions.

Highlights

  • Vulnerabilities often refer to flaws or weaknesses in hardware, software, protocol implementations, or system security policies that allow an attacker to access or compromise the system without authorization, and have become the root cause of the threats toward network security

  • For the first and second conditions, the fuzzing process is sufficient because the fuzzing can produce a large number of test samples and crash samples, which can be labeled during sample generation

  • We systematically reviewed the works of literature to analyze and assess the performance of machine learning techniques for fuzzing

Read more

Summary

Introduction

Vulnerabilities often refer to flaws or weaknesses in hardware, software, protocol implementations, or system security policies that allow an attacker to access or compromise the system without authorization, and have become the root cause of the threats toward network security. The working process of fuzzing is composed of four main stages: testcase generation, program execution, runtime state monitoring, and analysis of crashes. The testcase generation stage primarily provides input for the fuzzing process. It includes seed file generation, mutation, testcase generation, and testcase filtering. The mutation of seed files can generate a host of testcases by selecting different mutation strategies in different locations [18]. Mutation-based testcase generation strategies generate new testcases by modifying known seed files. Generation-based testcase generation strategies generate new testcases based on the format information of the input sample without mutation

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call