Linguistic steganalysis via multi-task with crossing generative-natural domain

Huiqing You,Lingyun Xiang,Chunfang Yang,Xiaobo Shen

doi:10.1016/j.neucom.2024.128260

Abstract

In the real world, the sources of text often transcend the boundaries of generative and natural domains, which introduces challenges for existing linguistic steganalysis methods. The typical problem arises from the sample selection bias caused by training solely on a single-source domain, rendering the model incapable of inferencing across the entire generative-natural (GN) space. Moreover, there exists the problem of overlooking the sensitive discrepancies between generative text and natural text, which hampers model fitting. In this paper, we model steganalysis in a brand-new perspective by employing multi-task learning to build the main task and auxiliary tasks in the cross GN domain. The proposed Cross Generative-Natural Domain Multi-task Model (CG-NDMM) can concurrently address the two aforementioned issues through i) modeling steganalysis across the entire GN space, incorporating two auxiliary tasks alongside a main task, and ii) utilizing a feature representation transfer learning strategy to harmonize two sub-networks. Furthermore, we employ diverse steganography algorithms to construct the datasets, which comprise four types of texts (generative-cover, generative-steganographic, natural-cover, and natural-steganographic) derived from two public datasets, Movie and Twitter. The experiments on these datasets demonstrate the effectiveness of the proposed approach, showcasing its substantially superior performance over the comparative baseline methods.

Full Text