Abstract

Existing research literally evaluates model fairness over limited observed data. In practice, however, factors such as maliciously crafted examples and naturally corrupted examples often appear in real-world data collection. This severely limits the reliability of bias removal methods, inhibits fairness improvement in long-term learning systems, and probes to study accuracy-related robustness. Therefore, we ask: How adversarial examples will skew model fairness? In this paper, we investigate the vulnerability of individual fairness and group fairness to adversarial attacks. We further propose a general adversarial fairness attack framework capable of twisting model bias through a small subset of adversarial examples. We formulate this problem as an optimization problem: maximizing the model bias with the constraint of the number of adversarial examples and the perturbation scale. Our approach finds the most vulnerable examples to model fairness based on the estimated distance from examples to the decision boundary and demographic information. The experimental results 11https://github.com/TaocsZhang/Fairness-Attack-via-Adversarial-Examples. show that model fairness is easily skewed by a small number of adversarial examples.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call