Abstract

It is well known that the minimum adversarial distortion associated with a specific sample x <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">0</inf> reflects the local robustness of neural networks. However, it is intractable to solve the optimization problem related to the minimum adversarial distortion for general neural network cause it is non-convex. Works have studied the lower bound or upper bound of the minimum adversarial distortion to give a robustness metrics for neural networks, such as CLEVER score and CW attack. In this paper, we provide a formal robustness guarantee to transform the robustness analysis into two sub-problems: (1), compute out the maximum of the objective function g <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">t</inf> <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">*</sup> (r <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">t</sup> ); (2), generate a sequence {r <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">t</sup> <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">h</inf> } s.t. limn→∞ r <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">t</sup> <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">h</inf> is the minimum adversarial distortion for targeted attack. Based on this transformation, we propose an efficient and effective algorithm to directly estimate the instance-specific minimum adversarial distortion on the norm of the input manipulation required to change the classifier decision. Experimental results on two data-sets, MNIST and CIFAR, show that our measure of robustness is between the lower bound given by CLEVER and the upper bound by CW, indicating our approach gives a precise measure of the robustness of neural network. In addition, the powerful mathematical technique Extreme Value Estimation enables our algorithm computationally feasible for large neural networks. To the best of our knowledge, our algorithm is the first attack-independent approach to directly evaluate the minimum adversarial distortion as a robustness metric for neural network.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call