Abstract

Generation-based fuzzing is effective in testing programs that require highly structured inputs. However, building a new generator often requires heavy manual efforts to summarise a large body of grammar rules to generate correct structures.In this paper, we introduce the idea of generator reuse, aiming to avoid the manual efforts required to build new generators. Our key insight is that for a format X (e.g., PDF) whose grammar rules are not yet supported by existing generators, we can often find generators that support the grammar rules of a different format Y (e.g., HTML) and converters between Y and X (e.g., HTML-to-PDF converters). Reusing the generators for Y and the converters, we can effortlessly assemble a generator to support the grammar rules of X (e.g., given an HTML generator and an HTML-to-PDF converter, we can connect them to form a PDF generator).To explore the validity of our idea of generator reuse, we apply the idea to build a PDF generator, reusing a popular HTML generator and a set of mainstream HTML-to-PDF converters. Throughout the application, we also gain an initial understanding of the limitations of our generator reuse idea and introduce a set of strategies aiming to mitigate those limitations. We evaluated Arcee using 6 infrastructural, popular PDF applications and libraries. Our empirical results show that the PDF generator is indeed useful: running it on the 6 applications and libraries for 2 weeks, we discovered 39 bugs, 28 of which are new bugs and 23 of which have explicit security implications. Our empirical results also show that the PDF generator is a better tool than existing PDF fuzzers: it outperforms the fuzzing tools that can be applied to PDF software (using code coverage and the number of discovered bugs as metrics).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call