The representation of individuals in Genetic Programming (GP) has a large impact on the evolutionary process. In previous work, we investigated the evolutionary process of three Grammar-Guided GP (GGGP) methods, Context-Free Grammars GP (CFG-GP), Grammatical Evolution (GE) and Structured Grammatical Evolution (SGE), in the context of the complex, real-world problem of predicting the glucose level of people with diabetes two hours ahead of time. We concluded that representation choice is more impactful with a higher maximum depth, and that CFG-GP better explores the search space for deeper trees, achieving better results. Furthermore, we find that CFG-GP relies more on feature construction, whereas GE and SGE rely more on feature selection. Additionally, we altered the GGGP methods in two ways: using ϵ-lexicase selection, which solved the overfitting problem of CFG-GP and helps it to adapt to patients with high glucose variability; and with a penalization of complex trees, to create more interpretable trees. Combining ϵ-lexicase selection with CFG-GP performed best. In this work, we extend on the previous work and evaluated the impact of initialization methods in the quality of solutions. We found that they have no significant impact, even when the change of representation has.
Read full abstract