Linguistic Steganalysis Merging Semantic and Statistical Features

Shengnan Guo,Ru Zhang,Weike You,Jianyi Liu,Zhongliang Yang

doi:10.1109/lsp.2022.3212630

Abstract

With the rapid development of Natural Language Processing (NLP), more and more linguistic steganography methods have appeared in recent years, which may bring great challenges to the protection of cyberspace security. Due to the powerful feature extraction capabilities of Deep neural networks (DNN) to learn semantic features of large volumes of text, traditional steganalysis methods using manual features have gradually evolved into DNN-based methods. However, whether these DNN-based steganalysis methods can extract enough carrier features to achieve efficient steganalysis so that they can completely replace traditional methods based on handcrafted features remains an open question. To explore the answer, in this paper, we propose a new steganalysis method to integrate semantic and statistical features. We use BERT to extract semantic features and TF-IDF with AutoEncoder to obtain statistical features of the input text. Finally, we design a fusion mechanism to combine these two features. The experimental results show that due to the addition of statistical features, the proposed model can significantly improve the detection performance over current DNN-based linguistic steganalysis models.

Full Text