On linear genetic programming

Markus Brameier

doi:10.17877/de290r-253

Abstract

The thesis is about linear genetic programming (LGP), a machine learning approach that evolves computer programs as sequences of imperative instructions. Two fundamental differences to the more common tree-based variant (TGP) may be identified. These are the graph-based functional structure of linear genetic programs, on the one hand, and the existence of structurally noneffective code, on the other hand. The two major objectives of this work comprise (1) the development of more advanced methods and variation operators to produce better and more compact program solutions and (2) the analysis of general EA/GP phenomena in linear GP, including intron code, neutral variations, and code growth, among others. First, we introduce efficient algorithms for extracting features of the imperative and functional structure of linear genetic programs. In doing so, especially the detection and elimination of noneffective code during runtime will turn out as a powerful tool to accelerate the time-consuming step of fitness evaluation in GP. Variation operators are discussed systematically for the linear program representation. We will demonstrate that so called effective instruction mutations achieve the best performance in terms of solution quality. These mutations operate only on the (structurally) effective code and restrict the mutation step size to one instruction. One possibility to further improve their performance is to explicitly increase the probability of neutral variations. As a second, more time-efficient alternative we explicitly control the mutation step size on the effective code (effective step size). Minimum steps do not allow more than one effective instruction to change its effectiveness status. That is, only a single node may be connected to or disconnected from the effective graph component. It is an interesting phenomenon that, to some extent, the effective code becomes more robust against destructions over the generations already implicitly. A special concern of this thesis is to convince the reader that there are some serious arguments for using a linear representation. In a crossover-based comparison LGP has been found superior to TGP over a set of benchmark problems. Furthermore, linear solutions turned out to be more compact than tree solutions due to (1) multiple usage of subgraph results and (2) implicit parsimony pressure by structurally noneffective code. The phenomenon of code growth is analyzed for different linear genetic operators. When applying instruction mutations exclusively almost only neutral variations may be held responsible for the emergence and propagation of intron code. It is noteworthy that linear genetic programs may not grow if all neutral variation effects are rejected and if the variation step size is minimum. For the same reasons effective instruction mutations realize an implicit complexity control in linear GP which reduces a possible negative effect of code growth to a minimum. Another noteworthy result in this context is that program size is strongly increased by crossover while it is hardly influenced by mutation even if step sizes are not explicitly restricted.

Full Text