Abstract

Feature selection is a desired process when learning from high-dimensional data. However, it is seldom considered in Genetic Programming (GP) for high-dimensional symbolic regression. This work aims to develop a new method, Genetic Programming with Feature Selection (GPWFS), to improve the generalisation ability of GP for symbolic regression. GPWFS is a two-stage method. The main task of the first stage is to select important/informative features from fittest individuals, and the second stage uses a set of selected features, which is a subset of original features, for regression. To investigate the learning/optimisation performance and generalisation capability of GPWFS, a set of experiments using standard GP as a baseline for comparison have been conducted on six real-world high-dimensional symbolic regression datasets. The experimental results show that GPWFS can have better performance both on the training sets and the test sets on most cases. Further analysis on the solution size, the number of distinguished features and total number of used features in the evolved models shows that using GPWFS can induce more compact models with better interpretability and lower computational costs than standard GP.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call