Abstract

Models often need to be constrained to a certain size for them to be considered interpretable. For example, a decision tree of depth 5 is much easier to understand than one of depth 50. Limiting model size, however, often reduces accuracy. We suggest a practical technique that minimizes this trade-off between interpretability and classification accuracy. This enables an arbitrary learning algorithm to produce highly accurate small-sized models. Our technique identifies the training data distribution to learn from that leads to the highest accuracy for a model of a given size. We represent the training distribution as a combination of sampling schemes. Each scheme is defined by a parameterized probability mass function applied to the segmentation produced by a decision tree. An Infinite Mixture Model with Beta components is used to represent a combination of such schemes. The mixture model parameters are learned using Bayesian Optimization. Under simplistic assumptions, we would need to optimize for O(d) variables for a distribution over a d-dimensional input space, which is cumbersome for most real-world data. However, we show that our technique significantly reduces this number to a fixed set of eight variables at the cost of relatively cheap preprocessing. The proposed technique is flexible: it is model-agnostic, i.e., it may be applied to the learning algorithm for any model family, and it admits a general notion of model size. We demonstrate its effectiveness using multiple real-world datasets to construct decision trees, linear probability models and gradient boosted models with different sizes. We observe significant improvements in the F1-score in most instances, exceeding an improvement of 100% in some cases.

Highlights

  • As Machine Learning (ML) becomes pervasive in our daily lives, there is an increased desire to know how models reach specific decisions

  • Referring back to our desiderata, it should be clear how we address some of the challenges: 1. The location of class boundaries are naturally produced by decision trees (DT), in the form of low-volume leaf regions

  • The improvements look different from what we observed for DT, which is to be expected across different model families

Read more

Summary

Introduction

As Machine Learning (ML) becomes pervasive in our daily lives, there is an increased desire to know how models reach specific decisions. In certain contexts this might not be important as long as the ML model itself works well, e.g., in product or movie recommendations. Regulations governing digital interactions might necessitate interpretability (Goodman and Flaxman, 2017). All these factors have generated a lot of interest around “model understanding.”. All these factors have generated a lot of interest around “model understanding.” Approaches in the area may be broadly divided into two categories: 1. Interpretability: build models that are inherently easy to interpret, e.g., rule lists (Letham et al, 2013; Angelino et al, 2017), decision trees (Breiman et al, 1984; Quinlan, 1993, 2004), sparse linear models (Ustun and Rudin, 2016), decision sets (Lakkaraju et al, 2016), pairwise interaction models that may be linear (Lim and Hastie, 2015), or additive (Lou et al, 2013)

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.