Frame-wise steganalysis is of significance for active steganography defense. By frame-wise detection, we can accurately find the embedding position of secret information and destroy the covert channel further. However, there is currently no research specifically aiming at frame-wise steganalysis of low-bit-rate compressed speech. Besides, most of the existing steganalysis methods are specifically designed for a specific category of steganography methods. They are difficult to apply to practical scenarios where the steganography algorithms are uncertain. In this paper, a general frame-wise steganalysis method for low-bit-rate compressed speech is proposed. To extract rich feature from a speech frame, we propose a dual-domain representation, which conducts feature extraction both in the compressed domain and the decoded time domain. In addition, we propose an efficient steganalysis network named Stegaformer to leach the intra-frame correlation from the obtained representation to enable steganalysis. In Stegaformer, an adaptive local correlation enhancement module is introduced to effectively models the local characteristics, which compensates for the drawback of traditional Transformer-based models. Experimental results show that our method performs better than the existing steganalysis methods in detecting multiple steganography methods for a speech frame.
Read full abstract