Abstract

We introduce a framework to discover interpretable regression models for high-stakes decision making in the context of safety-critical systems. The core of our proposal is a multi-objective hierarchical symbolic regression algorithm able to compute cluster-specific rankings of regression models ordered by increasing complexity. We discover the hierarchical structure by clustering the features’ importances of a post-hoc explainability framework (viz., SHAP) applied to a highly flexible predictive model (viz., XGBoost). We rely on a symbolic regression algorithm based on the simulated annealing meta-heuristic to infer sparse linear models which may include non-linear effects (e.g., log-transforms, multiplicative interactions...). This search is guided by two objectives: Maximizing predictive performance and minimizing complexity. It ends on a list of Pareto-optimal models that fosters a dynamic interpretative process: the user navigates from the least to the most complex model and decides the ones he can trust depending on whether he understands them, and whether he is satisfied by the quantified uncertainty of their parameters and predictions. Our approach achieves promising results when compared to more than ten other interpretable or black-box predictive models on eleven public regression datasets of various volumes, dimensionalities or domains, and on a proprietary dataset for highway crash prediction. On this last dataset, we demonstrate the usefulness of our new ranking-by-complexity of inherently interpretable models.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.