Abstract

Video snapshot compressive imaging (SCI) system enables high-frame-rate imaging by projecting multiple frames into a 2D snapshot measurement during a single exposure, and the original video frames can be reconstructed by solving an optimization problem. However, existing methods usually cannot achieve a good balance between reconstruction time and reconstruction quality, which has become a major obstacle for practical application of video SCI. In order to cope with this issue, we propose a residual ensemble network to learn the explicit inverse mapping from the 2D snapshot measurement to the original video. Specifically, the proposed network aims to exploit the spatiotemporal correlations between video frames for improving reconstruction quality. The spatiotemporal correlations of video frames demonstrate multiple types, including intra-frame spatial correlation, inter-frame forward and backward temporal correlation. With the purpose of fully capturing these differentiated correlations, we design four sub-networks, namely, a pseudo-3D U-shape sub-network, two residual sub-networks, and a serial forward and backward recurrent sub-network, and further assemble these four sub-networks into an ensemble network through alternate residual links. This ensemble network can effectively fuse the predictions of each sub-network and maintain spatiotemporal consistency between video frames. We further design a compound loss function to guide the network learning, and the new video can be fast reconstructed by simply feeding its 2D snapshot measurement into the learned network. The experimental results demonstrate that our network can significantly improve the reconstruction quality while maintaining low computational cost.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call