Real-time steganalysis for streaming media based on multi-channel convolutional sliding windows

Zhongliang Yang,Hao Yang,Ching-Chun Chang,Yongfeng Huang,Chin-Chen Chang

doi:10.1016/j.knosys.2021.107561

Abstract

In recent years, covert communication technologies based on Voice over Internet Protocol (VoIP) have received more and more attention, which meanwhile poses a significant threat to the security of cyberspace. In this paper, we are chiefly concerned with improving the accuracy and efficiency of detection of covert communications, and we propose a real-time VoIP steganalysis model to tackle these issues. Multi-channel convolutional sliding windows (CSW) are developed to analyze the correlations between a given frame and its neighboring frames in a VoIP signal. Within each sliding window, we employ two feature extraction channels to extract correlation features from the input signal. Each channel is constructed of multiple convolutional layers having a large number of convolution kernels. The extracted features are then fed to a forward fully connected network for feature fusion. By analyzing the statistical distribution of these features, the discriminator will determine whether the input speech signal contains covert information or not. We designed several experiments to test the proposed model’s detection performance under various conditions, including different embedding rates, different speech lengths, etc. Experimental results show that the proposed model can efficiently and accurately detect steganographic voice streams, especially in the case of low embedding rates. In addition, further experiments demonstrate that the proposed model can attain nearly real-time detection of VoIP speech signals and achieve state-of-the-art performance.

Full Text