Abstract

Deep neural networks (DNNs) are vulnerable to adversarial examples. Even under black-box setting that is without access to the target model, transfer-based attacks can easily fool the DNNs. To alleviate this problem, we propose a robust classification model against transfer attacks based on the framework of variational Auto-Encoders (VAEs) which are probabilistic generative models and have been successfully used to a large mount of tasks. Specifically, our model simulates the data generative process with several multivariate Gaussian distributions and DNNs: (1) We assume that the latent embedding generated by an encoder (a DNN) of each category corresponds to a multivariate Gaussian distribution. (2) A decoder (a DNN) is proposed to decodes the latent embedding into an observable. (3) Theoretical analysis illustrates that our model can predict data’s labels by maximizing the lower bound on the log-likelihood for each category utilizing Bayes’ theorem with excellent robustness against transfer attacks. Inference in our model is done in a variational way so the Stochastic Gradient Variational Bayes (SGVB) estimator and reparamerization trick can be utilized to optimize the evidence lower bound (ELBO). The experiments with quantitative comparisons show that our approach reaches state-of-the-art with significantly better robustness.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call