Abstract

Model complexity has a close relationship with the generalization ability and the interpretability of the learned models. Simple models are more likely to generalize well and easy to interpret. However, too much emphasis on minimizing complexity can prevent the discovery of more complex yet more accurate solutions. Genetic programming (GP) has a trend of generating overcomplex models that are difficult to interpret while not being able to generalize well. This work proposes a novel complexity measure based on the Rademacher complexity for GP for symbolic regression. The complexity of an evolved model is measured by the maximum correlation between the model and the Rademacher variables on the selected training instances. Taking minimizing the training error and the Rademacher complexity of the models as the two objectives, the proposed GP method has shown to be much superior to the standard GP on generalization performance. Compared with GP equipped with two state-of-the-art complexity measures, the proposed method still has a notable advance on generating a better front consisting of individuals with lower generalization errors and being simpler in the behavioral complexity. Further analyses reveal that compared with the state-of-the-art methods, the proposed GP method evolves models that are much closer to the target models in the model structure, and have better interpretability.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call