Symbolic regression methods generate expression trees that simultaneously define the functional form of a regression model and the regression parameter values. As a result, the regression problem can search many nonlinear functional forms using only the specification of simple mathematical operators such as addition, subtraction, multiplication, and division, among others. Currently, state-of-the-art symbolic regression methods leverage genetic algorithms and adaptive programming techniques. Genetic algorithms lack optimality certifications and are typically stochastic in nature. In contrast, we propose an optimization formulation for the rigorous deterministic optimization of the symbolic regression problem. We present a mixed-integer nonlinear programming (MINLP) formulation to solve the symbolic regression problem as well as several alternative models to eliminate redundancies and symmetries. We demonstrate this symbolic regression technique using an array of experiments based upon literature instances. We then use a set of 24 MINLPs from symbolic regression to compare the performance of five local and five global MINLP solvers. Finally, we use larger instances to demonstrate that a portfolio of models provides an effective solution mechanism for problems of the size typically addressed in the symbolic regression literature.
Read full abstract