Spinning Language Models: Risks of Propaganda-As-A-Service and Countermeasures

Eugene Bagdasaryan,Vitaly Shmatikov

doi:10.1109/sp46214.2022.9833572

Abstract

We investigate a new threat to neural sequence-to-sequence (seq2seq) models: training-time attacks that cause models to "spin" their outputs so as to support an adversary-chosen sentiment or point of view -- but only when the input contains adversary-chosen trigger words. For example, a spinned summarization model outputs positive summaries of any text that mentions the name of some individual or organization. Model spinning introduces a "meta-backdoor" into a model. Whereas conventional backdoors cause models to produce incorrect outputs on inputs with the trigger, outputs of spinned models preserve context and maintain standard accuracy metrics, yet also satisfy a meta-task chosen by the adversary. Model spinning enables propaganda-as-a-service, where propaganda is defined as biased speech. An adversary can create customized language models that produce desired spins for chosen triggers, then deploy these models to generate disinformation (a platform attack), or else inject them into ML training pipelines (a supply-chain attack), transferring malicious functionality to downstream models trained by victims. To demonstrate the feasibility of model spinning, we develop a new backdooring technique. It stacks an adversarial meta-task onto a seq2seq model, backpropagates the desired meta-task output to points in the word-embedding space we call "pseudo-words," and uses pseudo-words to shift the entire output distribution of the seq2seq model. We evaluate this attack on language generation, summarization, and translation models with different triggers and meta-tasks such as sentiment, toxicity, and entailment. Spinned models largely maintain their accuracy metrics (ROUGE and BLEU) while shifting their outputs to satisfy the adversary's meta-task. We also show that, in the case of a supply-chain attack, the spin functionality transfers to downstream models.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Spinning Language Models: Risks of Propaganda-As-A-Service and Countermeasures

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Deep Reinforcement Learning for Sequence-to-Sequence Models.
Yaser Keneshloo ... Tian Shi
IEEE Transactions on Neural Networks and Learning Systems | VOL. 31
Yaser Keneshloo, et. al.Yaser Keneshloo ... Tian Shi
01 Jan 2019
IEEE Transactions on Neural Networks and Learning Systems | VOL. 31

Normalization of Transliterated Mongolian Words Using Seq2Seq Model with Limited Data
Zolzaya Byambadorj ... Norihide Kitaoka
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 20
Zolzaya Byambadorj, et. al.Zolzaya Byambadorj ... Norihide Kitaoka
01 Sep 2021
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 20

Personalized Response Generation for Customer Service Agents
Ma Cuihua ... Ping Guo
-
Ma Cuihua, et. al.Ma Cuihua ... Ping Guo
01 Jan 2018
01 Jan 2018

Reusing Monolingual Pre-Trained Models by Cross-Connecting Seq2seq Models for Machine Translation
Jiun Oh ... Yong-Suk Choi
Applied Sciences | VOL. 11
Jiun Oh, et. al.Jiun Oh ... Yong-Suk Choi
19 Sep 2021
Applied Sciences | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Spinning Language Models: Risks of Propaganda-As-A-Service and Countermeasures

Abstract

Talk to us

Similar Papers