Abstract
While the advent of Deep Reinforcement Learning (DRL) has substantially improved the efficiency of Autonomous Vehicles (AVs), it makes them vulnerable to backdoor attacks that can potentially cause traffic congestion or even collisions. Backdoor functionality is typically implanted by poisoning training datasets with stealthy malicious data, designed to preserve high accuracy on legitimate inputs while inducing desired misclassifications for specific adversary-selected inputs. Existing countermeasures against backdoors predominantly concentrate on image classification, utilizing image-based properties, rendering these methods inapplicable to the regression tasks of DRL-based AV controllers that rely on continuous sensor data as inputs. In this paper, we introduce the first-ever defense against backdoors on regression tasks of DRL-based models, called Backdozer . Our method systematically extracts more abstract features from representations of training data by projecting them into a specific latent subspace and segregating them into several disjoint groups based on the distribution of legitimate outputs. The key observation of Backdozer is that authentic representations for each group reside in one latent subspace, whereas the incorporation of malicious data impacts that subspace. Backdozer optimizes a sample-wise weight vector for the representations capturing the disparities in projections originating from different groups. We experimentally demonstrate that Backdozer can attain \(100\% \) accuracy in detecting backdoors. We also evaluate its effectiveness against three closely related state-of-the-art defenses.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have