Abstract

Linguistic steganography studies how to hide secret messages in natural language cover texts. Traditional methods aim to transform a secret message into an innocent text via lexical substitution or syntactical modification. Recently, advances in neural language models (LMs) enable us to directly generate cover text conditioned on the secret message. In this study, we present a new linguistic steganography method which encodes secret messages using self-adjusting arithmetic coding based on a neural language model. We formally analyze the statistical imperceptibility of this method and empirically show it outperforms the previous state-of-the-art methods on four datasets by 15.3% and 38.9% in terms of bits/word and KL metrics, respectively. Finally, human evaluations show that 51% of generated cover texts can indeed fool eavesdroppers.

Highlights

  • Privacy is central to modern communication systems such as email services and online social networks

  • Our new method is built based on the previous study (Ziegler et al, 2019) which views each secret message as a binary fractional number and encodes it using arithmetic coding (Rissanen and Langdon, 1979) with a pretrained neural language models (LMs)

  • We theoretically prove the SAAC algorithm is nearimperceptible for linguistic steganography and empirically demonstrate its effectiveness on four datasets from various domains

Read more

Summary

Introduction

Privacy is central to modern communication systems such as email services and online social networks. Our new method is built based on the previous study (Ziegler et al, 2019) which views each secret message as a binary fractional number and encodes it using arithmetic coding (Rissanen and Langdon, 1979) with a pretrained neural LM This method generates cover text tokens one at a time (c.f. Fig. 2). Phy algorithms; (2) We propose SAAC, a new nearimperceptible linguistic steganography method that encodes secret messages using self-adjusting arithmetic coding with a neural LM; and (3) Extensive experiments on four datasets demonstrate our approach can on average outperform the previous state-of-the-art method by 15.3% and 38.9% in terms of bits/word and KL metrics, respectively

Linguistic Steganography
Statistical Imperceptibility
Arithmetic Coding
Imperceptibility Analysis
Experiment Setups
Methods
Experiment Results
Human Evaluation
Related Work
Discussions and Future Work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call