Adversities in Abstract Interpretation

Roberto Giacobazzi,Isabella Mastroeni,Elia Perantoni

doi:10.1145/3649309

Abstract

Robustness is a key and desirable property of any classifying system, in particular, to avoid the ever-rising threat of adversarial attacks. Informally, a classification system is robust when the result is not affected by the perturbation of the input. This notion has been extensively studied, but little attention has been dedicated to how the perturbation affects the classification. The interference between perturbation and classification can manifest in many different ways, and its understanding is the main contribution of the present paper. Starting from a rigorous definition of a standard notion of robustness, we build a formal method for accommodating the required degree of robustness — depending on the amount of error the analyst may accept on the classification result. Our idea is to precisely model this error as an abstraction . This leads us to define weakened forms of robustness also in the context of programming languages, particularly in language-based security — e.g., information-flow policies — and in program verification. The latter is possible by moving from a quantitative (standard) model of perturbation to a novel qualitative model, given by means of the notion of abstraction. As in language-based security, we show that it is possible to confine adversities, which means to characterize the degree of perturbation (and/or the degree of class generalization) for which the classifier may be deemed adequately robust. We conclude with an experimental evaluation of our ideas, showing how weakened forms of robustness apply to state-of-the-art image classifiers.

Full Text