Abstract

Abstract Recent studies show that neural networks are vulnerable to backdoor attacks, in which compromised networks behave normally for clean inputs but make mistakes when a pre-defined trigger appears. Although prior studies have designed various invisible triggers to avoid causing visual anomalies, they cannot evade some trigger detectors. In this paper, we consider the stealthiness of backdoor attacks from input space and feature representation space. We propose a novel backdoor attack named sparse backdoor attack, and investigate the minimum required trigger to induce the well-trained networks to make incorrect results. A U-net-based generator is employed to create triggers for each clean image. Considering the stealthiness of the trigger, we restrict the elements of the trigger between −1 and 1. In the aspect of the feature representation domain, we adopt an entanglement cost function to minimize the gap between feature representations of benign and malicious inputs. The inseparability of benign and malicious feature representations contributes to the stealthiness of our attack against various model diagnosis-based defences. We validate the effectiveness and generalization of our method by conducting extensive experiments on multiple datasets and networks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call