Abstract

The learning of classification models to predict class labels of new and previously unseen data instances is one of the most essential tasks in data mining. A popular approach to classification is ensemble learning, where a combination of several diverse and independent classification models is used to predict class labels. Ensemble models are important as they tend to improve the average classification accuracy over any member of the ensemble. However, classification models are also often required to be explainable to reduce the risk of irreversible wrong classification. Explainability of classification models is needed in many critical applications such as stock market analysis, credit risk evaluation, intrusion detection, etc. Unfortunately, ensemble learning decreases the level of explainability of the classification, as the analyst would have to examine many decision models to gain insights about the causality of the prediction. The aim of the research presented in this paper is to create an ensemble method that is explainable in the sense that it presents the human analyst with a conditioned view of the most relevant model aspects involved in the prediction. To achieve this aim the authors developed a rule-based explainable ensemble classifier termed Ranked ensemble G-Rules (ReG-Rules) which gives the analyst an extract of the most relevant classification rules for each individual prediction. During the evaluation process ReG-Rules was evaluated in terms of its theoretical computational complexity, empirically on benchmark datasets and qualitatively with respect to the complexity and readability of the induced rule sets. The results show that ReG-Rules scales linearly, delivers a high accuracy and at the same time delivers a compact and manageable set of rules describing the predictions made.

Highlights

  • One of the most important tasks in Data Mining applications is predictive analytics, or, in other words, the classification of previously unseen data instances by learning models from training data with known groundtruth

  • An empirical evaluation presented in this paper shows that the proposed ensemble approach produces a higher classification accuracy than the original G-Rules-Interquartile Range (IQR) classifier, offers a much lower abstaining rate and produces a moderate size prediction set of rules and maintains a high level of explainability for the human analyst

  • 1) EVALUATION USING SEPARATE TRAINING AND TEST DATASETS STRATEGY Table 4 compares three types of induced rules sets for each dataset: (1) number of rules generated by G-Rules-IQR classifier, (2) average number of rules induced by ReG-Rules classifier before utilising the local RULE MERGING (RM) algorithm, and (3) average number of rules generated by ReG-Rules after integrating the local RM algorithm in its selected base classifiers’ rules sets

Read more

Summary

INTRODUCTION

One of the most important tasks in Data Mining applications is predictive analytics, or, in other words, the classification of previously unseen data instances by learning models from training data with known groundtruth. This paper proposes a new rule-based ensemble learner that is different compared with its predecessors as it aims to maximise overall accuracy as well as maintaining a high level of explainability in terms of rule examinations needed for tracing individual predictions It is based on the most recent G-Rules-IQR approach due to its more expressive rule term structure and proposes a method to merge local rule sets and in turn minimises the human analyst’s number of rule examinations to explain a prediction. An empirical evaluation presented in this paper shows that the proposed ensemble approach produces a higher classification accuracy than the original G-Rules-IQR classifier, offers a much lower abstaining rate and produces a moderate size prediction set of rules and maintains a high level of explainability for the human analyst.

RELATED WORK
INDUCING RULE-TERMS DIRECTLY FROM NUMERICAL
G-RULES-IQR ALGORITHM
FRAMEWORK FOR THE ENSEMBLE CLASSIFIER
ENSEMBLE DIVERSITY GENERATION
BASE CLASSIFIERS INDUCTIONS
EVALUATION
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.