Distribution-Aware Testing of Neural Networks Using Generative Models

Swaroopa Dola,Matthew B Dwyer,Mary Lou Soffa

doi:10.1109/icse43902.2021.00032

Abstract

The reliability of software that has a Deep Neural Network (DNN) as a component is urgently important today given the increasing number of critical applications being deployed with DNNs. The need for reliability raises a need for rigorous testing of the safety and trustworthiness of these systems. In the last few years, there have been a number of research efforts focused on testing DNNs. However the test generation techniques proposed so far lack a check to determine whether the test inputs they are generating are valid, and thus invalid inputs are produced. To illustrate this situation, we explored three recent DNN testing techniques. Using deep generative model based input validation, we show that all the three techniques generate significant number of invalid test inputs. We further analyzed the test coverage achieved by the test inputs generated by the DNN testing techniques and showed how invalid test inputs can falsely inflate test coverage metrics. To overcome the inclusion of invalid inputs in testing, we propose a technique to incorporate the valid input space of the DNN model under test in the test generation process. Our technique uses a deep generative model-based algorithm to generate only valid inputs. Results of our empirical studies show that our technique is effective in eliminating invalid tests and boosting the number of valid test inputs generated.

Highlights

Deep Neural Networks (DNN) components are increasingly being deployed in mission and safety critical systems, e.g., [1], [2], [3], [4]
We focus on the neuron coverage (NC), k-multisection neuron coverage (KMNC), neuron boundary coverage (NBC), and strong neuron activation coverage (SNAC) criteria and we show that these metrics cannot differentiate between valid and invalid test inputs generated by existing DNN test generation techniques
This paper demonstrates that existing DNN test generation and test coverage techniques do not consider the valid input space, which can have several deleterious effects

Summary

INTRODUCTION

Deep Neural Networks (DNN) components are increasingly being deployed in mission and safety critical systems, e.g., [1], [2], [3], [4]. We focus on the challenges that DNN generalization presents to testing, and in particular how current DNN testing techniques treat valid and invalid inputs To understand these challenges, consider the implementation of a traditional software component C, which is developed to meet a specification S : Rn → Rm ∪ e, where e denotes the error behavior intended for invalid inputs. We demonstrate that existing DNN test coverage metrics, e.g., [5], [6], are unable to distinguish valid and invalid test cases, which risks biasing test suites toward including more invalid inputs in pursuit of higher coverage Building on these observations, we present a novel approach that combines a VAE model with existing test generation techniques to produce test cases with only valid inputs. Experimental strategy and results are described in §IV. §V discusses the threats to validity of our study and §VI concludes

Deep Neural Networks

DNN testing techniques

Out-of-Distribution Input Detection

Variational Autoencoder

Analysis of Existing DNN Test Generation Techniques

Our Test Generation Technique

EVALUATION

Evaluation Setup

Results and Research Questions

THREATS TO VALIDITY

CONCLUSIONS