Tool Support for Improving Software Quality in Machine Learning Programs

Kwok Sun Cheng,Tae-Hyuk Ahn,Myoungkyu Song,Pei-Chi Huang

doi:10.3390/info14010053

Abstract

Machine learning (ML) techniques discover knowledge from large amounts of data. Modeling in ML is becoming essential to software systems in practice. The accuracy and efficiency of ML models have been focused on ML research communities, while there is less attention on validating the qualities of ML models. Validating ML applications is a challenging and time-consuming process for developers since prediction accuracy heavily relies on generated models. ML applications are written by relatively more data-driven programming based on the black box of ML frameworks. All of the datasets and the ML application need to be individually investigated. Thus, the ML validation tasks take a lot of time and effort. To address this limitation, we present a novel quality validation technique that increases the reliability for ML models and applications, called MLVal. Our approach helps developers inspect the training data and the generated features for the ML model. A data validation technique is important and beneficial to software quality since the quality of the input data affects speed and accuracy for training and inference. Inspired by software debugging/validation for reproducing the potential reported bugs, MLVal takes as input an ML application and its training datasets to build the ML models, helping ML application developers easily reproduce and understand anomalies in the ML application. We have implemented an Eclipse plugin for MLVal that allows developers to validate the prediction behavior of their ML applications, the ML model, and the training data on the Eclipse IDE. In our evaluation, we used 23,500 documents in the bioengineering research domain. We assessed the ability of the MLVal validation technique to effectively help ML application developers: (1) investigate the connection between the produced features and the labels in the training model, and (2) detect errors early to secure the quality of models from better data. Our approach reduces the cost of engineering efforts to validate problems, improving data-centric workflows of the ML application development.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Tool Support for Improving Software Quality in Machine Learning Programs

Abstract

Talk to us

Similar Papers

More From: Information

Lead the way for us

Journal: Information	Publication Date: Jan 16, 2023
License type: CC BY 4.0

Similar Papers

Debugging Support for Machine Learning Applications in Bioengineering Text Corpora
Kwok Sun Cheng ... Myoungkyu Song
-
Kwok Sun Cheng, et. al.Kwok Sun Cheng ... Myoungkyu Song
01 Jun 2022
01 Jun 2022

A Novel Browser-based No-code Machine Learning Application Development Tool
Erol Ozan
-
Erol OzanErol Ozan
10 May 2021
10 May 2021

Ease.ml/ci and Ease.ml/meter in action
Cedric Renggli ... Kevin Schawinski
Proceedings of the VLDB Endowment | VOL. 12
Cedric Renggli, et. al.Cedric Renggli ... Kevin Schawinski
01 Aug 2019
Proceedings of the VLDB Endowment | VOL. 12

Integrating Fairness in Machine Learning Development Life Cycle: Fair CRISP-DM
Vivek K Singh ... Kailash Joshi
e-Service Journal | VOL. 14
Vivek K Singh, et. al.Vivek K Singh ... Kailash Joshi
01 Dec 2022
e-Service Journal | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Tool Support for Improving Software Quality in Machine Learning Programs

Abstract

Talk to us

Similar Papers

More From: Information