Abstract

The quality assurance for machine learning systems is becoming increasingly critical nowadays. While many efforts have been paid on trained models from such systems, we focus on the quality of these systems themselves, as the latter essentially decides the quality of numerous models thus trained. In this article, we focus particularly on detecting bugs in implementing one class of model-training systems, namely, linear classification algorithms, which are known to be challenging due to the lack of test oracle. Existing work has attempted to use metamorphic testing to alleviate the oracle problem, but fallen short on overlooking the statistical nature of such learning algorithms, leading to premature metamorphic relations (MRs) suffering efficacy and necessity issues. To address this problem, we first derive MRs from a fundamental property of linear classification algorithms, i.e., algorithm stability, with the soundness guarantee. We then formulate such MRs in a way that is rare in usage but could be more effective according to our field study and analysis, i.e., Past-execution Dependent MR (PD-MR), as contrast to the traditional way, i.e., Past-execution Independent MR (PI-MR), which has been extensively studied. We experimentally evaluated our new MRs upon nine well-known linear classification algorithms. The results reported that the new MRs detected 37.6–329.2% more bugs than existing benchmark MRs.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call