Abstract

Transformers based on attention mechanisms exhibit vulnerability to adversarial examples, posing a substantial threat to the security of their applications. Aiming to solve this problem, the concept of robustness certification is introduced to formally ascertain the presence of any adversarial example within a specified region surrounding a given sample. However, prior works have neglected the dependencies among inputs of softmax (the most complex function in attention mechanisms) during linear relaxations. This oversight has consequently led to imprecise certification results. In this work, we introduce GaLileo, a general linear relaxation framework designed to certify the robustness of Transformers. GaLileo effectively surmounts the trade-off between precision and efficiency in robustness certification through our innovative n-dimensional relaxation approach. Notably, our relaxation technique represents a pioneering effort as the first linear relaxation for n-dimensional functions such as softmax. Our novel approach successfully transcends the challenges posed by the curse of dimensionality inherent in linear relaxations, thereby enhancing linear bounds by incorporating input dependencies. Our evaluations encompassed a thorough analysis utilizing the SST and Yelp datasets along with diverse Transformers of different depths and widths. The experimental results demonstrate that, as compared to the baseline method CROWN-BaF, GaLileo achieves up to 3.24 times larger certified radii while requiring similar running times. Additionally, GaLileo successfully attains certification for Transformers' robustness against multi-word lp perturbations, marking a notable accomplishment in this field.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call