Improving GP generalization: a variance-based layered learning approach

Maryam Amir Haeri,Mohammad Mehdi Ebadzadeh,Gianluigi Folino

doi:10.1007/s10710-014-9220-6

Abstract

This paper introduces a new method that improves the generalization ability of genetic programming (GP) for symbolic regression problems, named variance-based layered learning GP. In this approach, several datasets, called primitive training sets, are derived from the original training data. They are generated from less complex to more complex, for a suitable complexity measure. The last primitive dataset is still less complex than the original training set. The approach decomposes the evolution process into several hierarchical layers. The first layer of the evolution starts using the least complex (smoothest) primitive training set. In the next layers, more complex primitive sets are given to the GP engine. Finally, the original training data is given to the algorithm. We use the variance of the output values of a function as a measure of the functional complexity. This measure is utilized in order to generate smoother training data, and controlling the functional complexity of the solutions to reduce the overfitting. The experiments, conducted on four real-world and three artificial symbolic regression problems, demonstrate that the approach enhances the generalization ability of the GP, and reduces the complexity of the obtained solutions.

Full Text