Convolutional neural networks (CNNs) have been widely used in various fields. However, it is essential to perform sufficient testing to detect internal defects before deploying CNNs, especially in security-sensitive scenarios. Generating error-inducing inputs to trigger erroneous behavior is the primary way to detect CNN model defects. However, in practice, when the model under test is a black-box CNN model without accessible internal information, in some scenarios it is still necessary to generate high-quality test inputs within a limited testing budget. In such a new scenario, a potential approach is to generate transferable test inputs by analyzing the internal knowledge of other white-box CNN models similar to the model under test, and then use transferable test inputs to test the black-box CNN model. The main challenge in generating transferable test inputs is how to improve their error-inducing capability for different CNN models without changing the test oracle. We found that different CNN models make predictions based on features of similar important regions in images. Adding targeted perturbations to important regions will generate transferable test inputs with high realism. Therefore, we propose the Interpretable Analysis based Transferable Test Generation method for CNNs (IATT), which employs interpretation methods of CNN models to explain and localize important regions in test inputs, using backpropagation optimizer and perturbation mask process to add targeted perturbations to these important regions, thereby generating transferable test inputs. This process is repeated to iteratively optimize the transferability and realism of the test inputs. To verify the effectiveness of IATT, we perform experimental studies on nine deep learning models, including ResNet-50 and Vit-B/16, and commercial computer vision system Google Cloud Vision , and compared our method with four state-of-the-art baseline methods. Experimental results show that transferable test inputs generated by IATT can effectively cause black-box target models to output incorrect results. Compared to existing testing and adversarial attack methods, the average error-inducing success rate (ESR) in different testing scenarios is 18.1% \(\sim\) 52.7% greater than the baseline methods. Additionally, the test inputs generated by IATT achieve high ESR while maintaining high realism.
Read full abstract