Abstract
The reliability of software that has a Deep Neural Network (DNN) as a component is urgently important today given the increasing number of critical applications being deployed with DNNs. The need for reliability raises a need for rigorous testing of the safety and trustworthiness of these systems. In the last few years, there have been a number of research efforts focused on testing DNNs. However the test generation techniques proposed so far lack a check to determine whether the test inputs they are generating are valid, and thus invalid inputs are produced. To illustrate this situation, we explored three recent DNN testing techniques. Using deep generative model based input validation, we show that all the three techniques generate significant number of invalid test inputs. We further analyzed the test coverage achieved by the test inputs generated by the DNN testing techniques and showed how invalid test inputs can falsely inflate test coverage metrics. To overcome the inclusion of invalid inputs in testing, we propose a technique to incorporate the valid input space of the DNN model under test in the test generation process. Our technique uses a deep generative model-based algorithm to generate only valid inputs. Results of our empirical studies show that our technique is effective in eliminating invalid tests and boosting the number of valid test inputs generated.
Highlights
Deep Neural Networks (DNN) components are increasingly being deployed in mission and safety critical systems, e.g., [1], [2], [3], [4]
We focus on the neuron coverage (NC), k-multisection neuron coverage (KMNC), neuron boundary coverage (NBC), and strong neuron activation coverage (SNAC) criteria and we show that these metrics cannot differentiate between valid and invalid test inputs generated by existing DNN test generation techniques
This paper demonstrates that existing DNN test generation and test coverage techniques do not consider the valid input space, which can have several deleterious effects
Summary
Deep Neural Networks (DNN) components are increasingly being deployed in mission and safety critical systems, e.g., [1], [2], [3], [4]. We focus on the challenges that DNN generalization presents to testing, and in particular how current DNN testing techniques treat valid and invalid inputs To understand these challenges, consider the implementation of a traditional software component C, which is developed to meet a specification S : Rn → Rm ∪ e, where e denotes the error behavior intended for invalid inputs. We demonstrate that existing DNN test coverage metrics, e.g., [5], [6], are unable to distinguish valid and invalid test cases, which risks biasing test suites toward including more invalid inputs in pursuit of higher coverage Building on these observations, we present a novel approach that combines a VAE model with existing test generation techniques to produce test cases with only valid inputs. Experimental strategy and results are described in §IV. §V discusses the threats to validity of our study and §VI concludes
Submitted Version (Free)
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.