Abstract

Since machine learning AI is implemented as a software-intensive system, SQuaRE, an international standard for system and software quality, is appropriate as a framework to discuss its quality. Machine learning AI depends on the training data, but the SQuaRE data quality model of ISO/IEC 25012 does not take into account the characteristics of the training data. This paper reviews the issues that should be considered when constructing training data in the development of machine learning AI, including the preparation of original raw data, the requirements-driven formulation of training data, the bias in data distribution, and fairness. Then, we propose a training data quality model as an extension of the SQuaRE data quality model. Specifically, we show the mapping of the SQuaRE data quality characteristics to those of the training data, and introduce two new data quality characteristics, Adequacy and Provenance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call