Abstract

Fuzzing has been a widely-used technique for discovering software vulnerabilities. Many existing fuzzers leverage coverage-feedback to evolve seeds to maximize (optimize) program branch coverage. Recently, some techniques propose to train deep learning models to predict the branch coverage of an arbitrary input. Those techniques have proved their success in improving coverage and discovering bugs under different experimental settings. However, deep learning models, usually as a black magic box, are notoriously lack of explanation. Moreover, their performance can be sensitive to the collected runtime coverage information for training, indicating potentially unstable performance. To this end, in this work we conduct a systematic and extensive empirical study on 4 types of deep learning models across 6 projects to reproduce the actual performance of deep learning fuzzers, analyze the advantages and disadvantages of deep learning in the process of fuzzing applications, and explore the future direction of the combination of the two. Our empirical results reveal that the deep learning models can only be effective in very limited scenarios, which is largely restrained by training data imbalance, dependant labels, model over-generalization, and the insufficient expressiveness of the state-of-the-art models. Consequently, the estimated gradients by the models to cover a branch can be less helpful in many scenarios.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call