Abstract

With recent breakthroughs in Deep Learning (DL), DL systems are increasingly deployed in safety-critical fields. Hence, some software testing methods are required to ensure the reliability and safety of DL systems. Since the rules of DL systems are inferred from training data, it is difficult to know the implementation rules about each behavior of DL systems. At the same time, Random Testing (RT) is a popular testing method and the knowledge about software implementation is not needed when we use RT. Therefore, RT is very suitable for the testing of DL systems. And the existing mechanisms for testing DL systems also depend heavily on RT by the labeled test data. In order to increase the effectiveness of RT for DL systems, we design, implement and evaluate the Adaptive Random Testing for DL systems (ARTDL), which is the first Adaptive Random Testing (ART) method to improve the effectiveness of RT for DL systems. ARTDL refers to the idea of ART. That is, fewer test cases are needed to detect failures by selecting the test case with the furthest distance from non-failure-causing test cases. Firstly, we propose the Feature-based Euclidean Distance (FED) as the distance metric that can be used to measure the difference between failure-causing inputs and non-failure-causing inputs. Secondly, we verify the availability of FED by presenting the failure pattern of DL models. Finally, we design ARTDL algorithm to generate the test cases that are more likely to cause failures based on the FED. We implement ARTDL to test top performing DL models in the field of image classification and automatic driving. The results show that, on average, the number of test cases used to find the first bug is reduced by 62.74% through ARTDL, compared with RT.

Highlights

  • In the past few years, Deep Learning (DL) systems have demonstrated amazing performance in various domains such as image classification [1], [2], speech recognition [3], and playing games [4]

  • In order to increase the effectiveness of RT for DL systems while maintaining the benefits of RT, we propose a new test method denoted as Adaptive Random Testing for DL systems (ARTDL)

  • We estimate the performance of ARTDL by the metric: F-measure, which is used to evaluate the effectiveness of testing method in the field of software testing and it is defined as the expected number of test cases generated until the first fault is detected [15]

Read more

Summary

Introduction

In the past few years, Deep Learning (DL) systems have demonstrated amazing performance in various domains such as image classification [1], [2], speech recognition [3], and playing games [4]. Based on these advances, DL systems are increasingly deployed in safety-critical fields such as autonomous vehicles [5], medical diagnostics [6] and aircraft collision avoidance [7]. Such incorrect behaviors can lead to some fatal crashes when DL systems are deployed in safety-critical domain [8], [9].

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.