Smoke testing for machine learning: simple tests to discover severe bugs

Steffen Herbold,Tobias Haar

doi:10.1007/s10664-021-10073-7

Steffen Herbold, Tobias Haar

Open Access

https://doi.org/10.1007/s10664-021-10073-7

Copy DOI

Abstract

Machine learning is nowadays a standard technique for data analysis within software applications. Software engineers need quality assurance techniques that are suitable for these new kinds of systems. Within this article, we discuss the question whether standard software testing techniques that have been part of textbooks since decades are also useful for the testing of machine learning software. Concretely, we try to determine generic and simple smoke tests that can be used to assert that basic functions can be executed without crashing. We found that we can derive such tests using techniques similar to equivalence classes and boundary value analysis. Moreover, we found that these concepts can also be applied to hyperparameters, to further improve the quality of the smoke tests. Even though our approach is almost trivial, we were able to find bugs in all three machine learning libraries that we tested and severe bugs in two of the three libraries. This demonstrates that common software testing techniques are still valid in the age of machine learning and that considerations how they can be adapted to this new context can help to find and prevent severe bugs, evenin mature machine learning libraries.

Highlights

Machine learning is nowadays a standard technique for the analysis of data in many research and business domains, e.g., for text classification (Collobert and Weston 2008), 45 Page 2 of 30Empir Software Eng (2022) 27:45 object recognition (Donahue et al 2014), credit scoring (Huang et al 2007), online marketing (Glance et al 2005), or news feed management in social networks (Paek et al 2010)
We presented an approach for combinatorial smoke testing of machine learning that is grounded in equivalence class analysis and boundary value analysis for the definition of tests
We define a set of difficult equivalence classes that specify suitable inputs for smoke testing of machine learning algorithms

Summary

Introduction

Machine learning is nowadays a standard technique for the analysis of data in many research and business domains, e.g., for text classification (Collobert and Weston 2008), 45 Page 2 of 30Empir Software Eng (2022) 27:45 object recognition (Donahue et al 2014), credit scoring (Huang et al 2007), online marketing (Glance et al 2005), or news feed management in social networks (Paek et al 2010). The current literature focuses on solving specific issues of algorithms or problems in a domain application, e.g., through metamorphic testing as a replacement for the test oracle (e.g., Murphy et al 2008; Xie et al 2011; Zhang et al 2011; Nakajima et al 2016; Pei et al 2017; Ding et al 2017; Tian et al 2018), i.e., testing software by modifying the inputs and using the prior results as test oracle for the outcome of the algorithm on the modified input The drawback of this often relatively narrow focus of the literature on specific algorithms (e.g., a certain type of neural network) or even use cases is that, while the solutions often yield good results for this case, it can be difficult to transfer the results to other contexts. The solutions are often highly technical to achieve the maximum benefit for a specific problem and not generalizable to machine learning algorithms in general

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Empirical Software Engineering	Publication Date: Jan 21, 2022
Citations: 4	License type: open-access

R Discovery Prime

R Discovery Prime

Smoke testing for machine learning: simple tests to discover severe bugs

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Empirical Software Engineering

Lead the way for us

Similar Papers

Understanding Software-2.0
Malinda Dilhara ... Danny Dig
ACM Transactions on Software Engineering and Methodology | VOL. 30
Malinda Dilhara, et. al.Malinda Dilhara ... Danny Dig
23 Jul 2021
ACM Transactions on Software Engineering and Methodology | VOL. 30

Combinatorial testing and machine learning for automated test generation
Yves Le Traon ... Tao Xie
Software Testing, Verification and Reliability | VOL. 33
Yves Le Traon, et. al.Yves Le Traon ... Tao Xie
10 May 2023
Software Testing, Verification and Reliability | VOL. 33

Comparing the Effectiveness of Software Testing Strategies
V.R Basili ... R.W Selby
IEEE Transactions on Software Engineering | VOL. SE-13
V.R Basili, et. al.V.R Basili ... R.W Selby
01 Dec 1987
IEEE Transactions on Software Engineering | VOL. SE-13

Role of Machine Learning & Artificial Intelligence Techniques in Software Testing
Nilofar Mulla, Dr Naveenkumar Jayakumar
Turkish Journal of Computer and Mathematics Education (TURCOMAT) | VOL. 12
Nilofar Mulla, Dr Naveenkumar JayakumarNilofar Mulla, Dr Naveenkumar Jayakumar
05 Apr 2021
Turkish Journal of Computer and Mathematics Education (TURCOMAT) | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Smoke testing for machine learning: simple tests to discover severe bugs

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Empirical Software Engineering