Automatic data generation is a key component of automated software testing. Random generation of test input data can uncover some bugs in software, but its effectiveness decreases when those inputs must satisfy complex properties in order to be meaningful. In this work, we study an evolutionary approach to generate values that can be encoded as algebraic data types plus additional properties. First, the approach is illustrated with the generation of sorted lists. Then, we generalize the technique to arbitrary algebraic data type definitions. Finally, we consider the problem of constrained data types where the data must satisfy some nontrivial property, using the well-known example of red-black trees for our experiments. This example will allow us to introduce the main principles of evolutionary algorithms and how these principles can be applied to obtain valid, nontrivial samples of a given data structure. Our experiments have revealed that this evolutionary approach is able to improve diversity, and increase the size of valid generated values with respect to simple random sampling techniques.
Read full abstract