Abstract

Tools for understanding and explaining complex predictive models are critical for user acceptance and trust. One such tool is rule extraction, i.e., approximating opaque models with less powerful but interpretable models. Pedagogical (or black-box) rule extraction, where the interpretable model is induced using the original training instances, but with the predictions from the opaque model as targets, has many advantages compared to the decompositional (white-box) approach. Most importantly, pedagogical methods are agnostic to the kind of opaque model used, and any learning algorithm producing interpretable models can be employed for the learning step. The pedagogical approach has, however, one main problem, clearly limiting its utility. Specifically, while the extracted models are trained to mimic the opaque, there are absolutely no guarantees that this will transfer to novel data. This potentially low test set fidelity must be considered a severe drawback, in particular when the extracted models are used for explanation and analysis. In this paper, a novel approach, solving the problem with test set fidelity by utilizing the conformal prediction framework, is suggested for extracting interpretable regression models from opaque models. The extracted models are standard regression trees, but augmented with valid prediction intervals in the leaves. Depending on the exact setup, the use of conformal prediction guarantees that either the test set fidelity or the test set accuracy will be equal to a preset confidence level, in the long run. In the extensive empirical investigation, using 20 publicly available data sets, the validity of the extracted models is demonstrated. In addition, it is shown how normalization can be used to provide individualized prediction intervals, thus providing highly informative extracted models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call