Abstract

Video Compressive Sensing (VCS) works to recover the scene video from limited compressed measurements. VCS was intended to sense and recover the scene video in spatial-temporal sensing manner. It is difficult to be performed due to the complexity of design and optimization. The most current approaches measure the scene video only in the spatial or temporal domain. However, this would lose the spatial-temporal correlation in VCS. Focus on this issue, this paper proposes a VCS framework, which uses the learned spatial-temporal sensing manner and the hybrid-3D recovery network. In terms of technical study, we develop a hybrid-3D residual block consisting of Pseudo-3D and True-3D sub-blocks. This structure enables the network to intuitively represent the spatial-temporal feature, and significantly reduces the network parameters. In the detailed design, we explore the optimal hybrid-3D blocks. Experimentally, we validate the effectiveness of the learned spatial-temporal sensing manner. In addition, experimental results show that the proposed method achieves state-of-the-art performance on video sequences belonging to Vid 4 and SPMCS dataset, while the recovery speed is ultra-fast.

Highlights

  • Compressive sensing (CS) theory [1]–[3] is able to acquire measurements of signals at sub-Nyquist rates and recover signals with high probability when the signals are sparse in a certain domain

  • According to the sensing manner, video compressive sensing (VCS) is divided into spatial-VCS (S-VCS) and temporal-VCS (T-VCS)

  • For S-VCS [29]–[32], the compression is only taken in the spatial domain, and the measurement streaming is obtained from the scene video frame by frame

Read more

Summary

INTRODUCTION

Compressive sensing (CS) theory [1]–[3] is able to acquire measurements of signals at sub-Nyquist rates and recover signals with high probability when the signals are sparse in a certain domain. The proposed framework consists of the learned spatial-temporal sensing and the hybrid-3D recovery network. In [32], the authors propose CSVideoNet which exploits a multi-level compression strategy in spatial domain to get measurements It obtains the preliminary recovery with CNN, and enhances inter-frame correlation by employing recurrent convolutional neural network (RNN) to improve the recovery quality. To further improve the effect of reconstruction, we use a hybrid-3D residual block to supplement spatial-temporal information of preliminary recovery results. Whereas the recovery time of True-3D block is far more, especially in the case of large block number Considering both aspects above, hybrid-3D residual block has more excellent ability to represent spatial-temporal feature with less recovery time. In order to find an optimal network, we explore the influence of the different kernel sizes and block numbers of the hybrid-3D residual blocks on the recovery performance. The proper model can be customized by adjusting the number of blocks

EXPERIMENTS
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call