Abstract
Greybox fuzzing is dedicated to revealing software bugs by maximizing code coverage. Concentrating on code coverage, greybox fuzzing effectively exposes bugs in real-world programs by continuously executing the program under test (PUT) with the test inputs generated from initial seeds, making it a popular software testing technique. Although powerful, the effectiveness of greybox fuzzing can be restricted in some cases. Ignoring the significant degrees of executed functions, traditional greybox fuzzing usually fails to identify significant seeds that execute more significant functions, and thus may assign similar energy to significant and trivial seeds when conducting power scheduling. As a result, the effectiveness of greybox fuzzing can be degraded due to wasting too much energy on trivial seeds. In this paper, we introduce function significance (FS) to measure the significant degrees of functions. Our key insight is that the influential functions that connect to many other functions are significant to greybox fuzzing as they provide more probabilities to reach previously unexplored code regions. To quantify FS, we conduct influence analysis upon the call graphs extracted from the PUTs to obtain the centrality values of function nodes. With FS as the significance measurement, we further propose FunFuzz , an FS-aware greybox fuzzing technique, to optimize significant seeds and tackle the aforementioned restriction. To this end, FunFuzz dynamically tracks the functions executed by a seed during fuzzing, and computes the significance score for the seed by accumulating the FS values of the functions executed by it. Based on the computed FS values, FunFuzz then takes an estimation-based power scheduling to assign more (or less) energy to seeds that achieve over-estimated (or under-estimated) significance scores. Specifically, the seed energy is adjusted by multiplying with a scale factor computed regarding the ratio of the actual significance score achieved by executing the seed and the estimated significance score predicted by a linear model constructed on-the-fly. To evaluate FunFuzz , we prototype it on top of AFL++ and conduct experiments with 15 programs, of which 10 are from common real-world projects and five are from Magma, and compare it to seven popular fuzzers. The experimental results obtained through fuzzing exceeding 40,800 CPU hours show that: (1) In terms of covering code, FunFuzz outperforms AFL++ by achieving 0.1% \(\sim\) 18.4% more region coverage on 13 out of 15 targets. (2) In terms of finding bugs, FunFuzz unveils 114 unique crashes and 25 Magma bugs (which are derived from CVEs) in 20 trials of 24-hour fuzzing, which are the most compared to the competitor fuzzers and include 32 crashes and 1 Magma bug that the other fuzzers fail to discover. Besides the experiments focusing on code coverage and bug finding, we evaluate the key components of FunFuzz , namely the FS-centered estimation-based power scheduling and the lazy FS computation mechanism. The extensive evaluation not only suggests FunFuzz ’s superiority in code coverage and bug finding, but also demonstrates the effectiveness of the two components.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have