Abstract
Deep Neural Networks, despite their success in diverse domains, are provably sensitive to small perturbations which cause the models to return erroneous predictions to minor transformations. Recently, it was proposed that this effect can be addressed in the text domain by optimizing for the worst case loss function over all possible word substitutions within the training examples. However, this approach is prone to weighing semantically unlikely word replacements higher, resulting in accuracy loss. In this paper, we study robustness to adversarial perturbations by using differentially private randomized substitutions while training the model. This approach has two immediate advantages: (1) by ensuring that the word replacement likelihood is weighted by its proximity to the original word in a metric space, we circumvent optimizing for worst case guarantees thereby achieve performance gains; and (2) the calibrated randomness results in training a privacy preserving model, while also guaranteeing robustness against adversarial attacks on the model outputs. Our approach uses a novel density-based differentially private mechanism based on truncated Gumbel noise. This ensures training on substitutions of words in dense and sparse regions of a metric space while maintaining semantic similarity for model robustness. Our experiments on two datasets suggest an improvement of up to 10% on the accuracy metrics.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.