Abstract
Deep neural networks have been applied to video compressive sensing (VCS) task recently. The existing DNN-based VCS methods compress and reconstruct the scene video only in space or time dimensions, which ignores the spatial-temporal correlation of the video. And they generally utilize pixel-wise loss as the loss function, which causes the results to be over-smoothed. In this paper, we propose a perceptual spatial-temporal VCS network. The spatial-temporal VCS network, which compresses and recovers the video in both space and time dimensions, can preserve the spatial-temporal correlation of the video. Besides, we refine the perceptual loss by selecting specific feature-wise loss terms and adding a pixel-wise loss term. The refined perceptual loss can guide the spatial-temporal network to retain more textures and structures. Experimental results show the proposed method can achieve better visual effect with less recovery time than the state-of-the-art.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have