Debugging Support for Machine Learning Applications in Bioengineering Text Corpora

Kwok Sun Cheng,Myoungkyu Song,Tae-Hyuk Ahn

doi:10.1109/compsac54236.2022.00166

Abstract

Modeling in machine learning (ML) is becoming an essential part of software systems in practice. Validating ML applications is a challenging and time-consuming process for developers since the accuracy of prediction heavily relies on generated models. ML applications are written by relatively more data-driven programming based on the blackbox of ML frameworks. If all of the datasets and the ML application need to be individually investigated, the ML debugging tasks would take a lot of time and effort. To address this limitation, we present a novel debugging technique for machine learning applications, called MLDBUG that helps ML application developers inspect the training data and the generated features for the ML model. Inspired by software debugging for reproducing the potential reported bugs, MLDBUG takes as input an ML application and its training datasets to build the ML models, helping ML application developers easily reproduce and understand anomalies on the ML application. We have implemented an Eclipse plugin for MLDBUG which allows developers to validate the prediction behavior of their ML applications, the ML model, and the training data on the Eclipse IDE. In our evaluation, we used 23,500 documents in the bioengineering research domain. We assessed the MLDBUG's capability of how effectively our debugging technique can help ML application developers investi-gate the connection between the produced features and the labels in the training model and the relationship between the training instances and the instances the model predicts.

Full Text