The Impact of Data Completeness and Correctness on Explainable Machine Learning Models

Shelernaz Azimi,Claus Pahl

doi:10.26421/jdi3.2-2

Abstract

Many systems in the Edge Cloud, the Internet-of-Things or Cyber-Physical Systems are built for processing data, which is delivered from sensors and devices, transported, processed and consumed locally by actuators. This, given the regularly high volume of data, permits Artificial Intelligence (AI) strategies like Machine Learning (ML) to be used to generate the application and management functions needed. The quality of both source data and machine learning model is here unavoidably of high significance, yet has not been explored sufficiently as an explicit connection of the ML model quality that are created through ML procedures to the quality of data that the model functions consume in their construction. Here, we investigated the link between input data quality for ML function construction and the quality of these functions in data-driven software systems towards explainable model construction through an experimental approach with IoT data using decision trees.We have 3 objectives in this research: 1. Search for indicators that influence data quality such as correctness and completeness and model construction factors on accuracy, precision and recall. 2. Estimate the impact of variations in model construction and data quality. 3. Identify change patterns that can be attributed to specific input changes. This ultimately aims to support {\em explainable AI}, i.e., the better understanding of how ML models work and what impacts on their quality.

Full Text