Abstract

Different from the single low-light image enhancement, the key for video enhancement is to improve temporal stability in spatial and temporal varying manner. However, most methods fail to effectively align images or features of adjacent frames to fuse spatio-temporal information and achieve limited enhancement performance, especially in dynamic videos with scene changes and moving targets. In this paper, we propose a spatio-temporal propagation and reconstruction (STPR) network to enable video representation learning for low-light video enhancement. Specifically, we develop a recursively supervised-based pyramid residual dense structure for extracting expressive context features from video sequences, the unique structural design enlarges the receptive field in multi-scale space with limited training parameters. Moreover, to strengthen the interaction of across-frame features, a feature propagation subnet is present to achieve precise alignment and effective fusion of spatio-temporal features. After obtaining the aggregated features of adjacent frames, the spatio-temporal feature reconstruction subnet exploits temporal dependence to improve the quality of target frames and enforce temporal consistency. Experimental results on several low-light video datasets demonstrate that our network outperforms state-of-the-art methods both in quantitative and visual comparisons.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call