Abstract

Random Forests are one of the most popular classifiers in machine learning. The larger they are, the more precise the outcome of their predictions. However, this comes at a cost: it is increasingly difficult to understand why a Random Forest made a specific choice, and its running time for classification grows linearly with the size (number of trees). In this paper, we propose a method to aggregate large Random Forests into a single, semantically equivalent decision diagram which has the following two effects: (1) minimal, sufficient explanations for Random Forest-based classifications can be obtained by means of a simple three step reduction, and (2) the running time is radically improved. In fact, our experiments on various popular datasets show speed-ups of several orders of magnitude, while, at the same time, also significantly reducing the size of the required data structure.

Highlights

  • Random1 Forests are one of the most widely known classifiers in machine learning [2,19]

  • We present an optimisation method that is based on algebraic aggregation: Random Forests are transformed into a single decision diagram in a semanticspreserving fashion, which, in particular, preserves the learner’s variance and accuracy

  • We have presented an approach to aggregate large Random Forests into single and compact decision diagrams that faithfully reflects the semantics of the original Random Forest for a considered purpose

Read more

Summary

Introduction

Random Forests are one of the most widely known classifiers in machine learning [2,19]. We present an optimisation method that is based on algebraic aggregation: Random Forests are transformed into a single decision diagram in a semanticspreserving fashion, which, in particular, preserves the learner’s variance and accuracy. The great advantage of the resulting decision diagrams is their absence of redundancy: during classification every predicate is considered at most once, and only if its evaluation is required This allows one to obtain concise explanations and evaluation times that are optimal.. Key to our approach are Algebraic Decision Diagrams (ADDs) [28] Their algebraic structure supports compositional aggregation, abstraction, and reduction operations that lead to minimal normal forms. Using basic algebraic operations, such as concatenation and addition, allows us to aggregate a Random Forest into a single ADD that faithfully maintains the individual results of each tree in the forest. Abstracting results (i.e. the leaf structure of the decision diagrams) to the essence, in this case the

Up to an underlying predicate ordering
Algebraic decision structures
Random forests
The essence of ADDs
Co-domain algebras and their relationships
Correctness and optimality
Infeasible path reduction
Towards explainability
Experimental performance evaluation
10 Related work
11 Conclusions and future work
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.