Video surveillance has various applications in various fields and industries. However, the rapid development of video processing technology has made video surveillance information susceptible to multiple malicious attacks. At present, the state-of-the-art methods, including the latest deep learning techniques, cannot get satisfactory results when addressing video surveillance object forgery detection (VSOFD) due to the following limitations: (i) lack of VSOFD-specific features for effective processing and (ii) lack of effective deep network architecture designed explicitly for VSOFD. This paper proposes a new detection scheme to alleviate these limitations. The proposed approach first extracted VSOFD-specific features via residual-based steganalysis feature (RSF) from the spatial-temporal-frequent domain. Key clues of video frames can be more effectively learned from RSF, instead of raw frame images. Then, the RSF feature is used to form the residual-based steganography feature vector group (RSFVG), which serves as the input of our following network. Finally, a new VSOFD-specific deep network architecture called parallel-DenseNet-concatenated-LSTM (PDCL) network is designed, which includes two improved CNN and RNN modules. The improved CNN module fuses and processes the coarse-to-fine feature extraction and simultaneously preserves the frame independence in video frames. The improved RNN module learns the correlation features between the adjacent frames to identify forgery frames. Experimental results show that the proposed scheme using the PDCL network with RSF can achieve high performance in test error, precision, recall, and F1 scores in our newly constructed dataset (SYSU-OBJFORG + newly generated forgery video clips). Compared to existing SOTA methods, our framework achieves the best F1 score of 90.33%, which is greatly improved by nearly 8%.
Read full abstract