Machine learning (ML) algorithms are vulnerable to poisoning attacks, where a fraction of the training data is manipulated to deliberately degrade the algorithms' performance. Optimal attacks can be formulated as bilevel optimization problems and help to assess their robustness in worst case scenarios. We show that current approaches, which typically assume that hyperparameters remain constant, lead to an overly pessimistic view of the algorithms' robustness and of the impact of regularization. We propose a novel optimal attack formulation that considers the effect of the attack on the hyperparameters and models the attack as a multiobjective bilevel optimization problem. This allows us to formulate optimal attacks, learn hyperparameters, and evaluate robustness under worst case conditions. We apply this attack formulation to several ML classifiers using L2 and L1 regularization. Our evaluation on multiple datasets shows that choosing an "a priori" constant value for the regularization hyperparameter can be detrimental to the performance of the algorithms. This confirms the limitations of previous strategies and evidences the benefits of using L2 and L1 regularization to dampen the effect of poisoning attacks, when hyperparameters are learned using a small trusted dataset. Additionally, our results show that the use of regularization plays an important robustness and stability role in complex models, such as deep neural networks (DNNs), where the attacker can have more flexibility to manipulate the decision boundary.
Read full abstract