Abstract

Machine Learning prediction algorithms have made significant contributions in today’s world, leading to increased usage in various domains. However, as ML algorithms surge, the need for transparent and interpretable models becomes essential. Visual representations have shown to be instrumental in addressing such an issue, allowing users to grasp models’ inner workings. Despite their popularity, visualization techniques still present visual scalability limitations, mainly when applied to analyze popular and complex models, such as Random Forests (RF). In this work, we propose Random Forest Similarity Map (RFMap), a scalable interactive visual analytics tool designed to analyze RF ensemble models. RFMap focuses on explaining the inner working mechanism of models through different views describing individual data instance predictions, providing an overview of the entire forest of trees, and highlighting instance input feature values. The interactive nature of RFMap allows users to visually interpret model errors and decisions, establishing the necessary confidence and user trust in RF models and improving performance.

Highlights

  • Machine Learning (ML) algorithms have seen widespread usage in numerous fields over the past few years

  • We present two usage scenarios to evaluate the effectiveness of Random Forest Similarity Map (RFMap) in interpreting and visualizing Random Forest (RF) models

  • Karen uses our RFMap system to visualize and interpret an RF model she has developed to classify breast cancer diagnosis. The dataset she uses to train the RF model is from the University of Wisconsin (Wisconsin Breast Cancer Diagnostic [65]), and it contains samples of solid breast masses collected from 569 patients, out of which 357 were labeled as Benign (B) and 212 as Malignant (M)

Read more

Summary

Introduction

Machine Learning (ML) algorithms have seen widespread usage in numerous fields over the past few years. The need to have better predictive performance in real-life use cases often leads to an intrinsic problem: interpreting the produced results [7]. COMPAS (Correctional Offender Management Profiling for Alternative Sanctions), a software for judging the likelihood of a criminal defendant becoming a recidivist, has been widely criticized for its biased racial decisions [8]. It was observed from the algorithm results that people of color were at greater risk of recidivism than white defendants, and the reasons are not clear since race is not used for prediction. Since ML techniques have become ubiquitous, especially in crucial decision-making involving humans, there is a considerable demand to explain the complex algorithm workability and help decision-makers gain confidence and trust in the algorithm [9]

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call