Genetic programming - a tool for flexible rule extraction

R Konig,L Niklasson,U Johansson

doi:10.1109/cec.2007.4424621

Abstract

Although data mining is performed to support decision making, many of the most powerful techniques, like neural networks and ensembles, produce opaque models. This lack of interpretability is an obvious disadvantage, since decision makers normally require some sort of explanation before taking action. To achieve comprehensibility, accuracy is often sacrificed by the use of simpler, transparent models, such as decision trees. Another alternative is rule extraction; i.e. to transform the opaque model into a comprehensible model, keeping acceptable accuracy. We have previously suggested a rule extraction algorithm named G-REX, which is based on genetic programming. One key property of G-REX, due to the use of genetic programming, is the possibility to use different representation languages. In this study we apply G-REX to estimation tasks. More specifically, three representation languages are evaluated using eight publicly available data sets. The quality of the extracted rules is compared to two standard techniques producing comprehensible models; multiple linear regression and the decision tree algorithm C&RT. The results show that G-REX outperforms the standard techniques, but that the choice of representation language is important.

Full Text