High-dynamic range (HDR) video reconstruction using conventional single-exposure sensors can be achieved by temporally alternating exposures. This, in turn, requires computing exposure alignment which is difficult to achieve due to the exposure differences that notoriously creates problems for moving content, in particular, in larger saturated and dis-occluded regions. An attractive alternative are dual-exposure sensors that capture, in a single-shot, differently exposed and spatially interleaved half-frames, so that they are perfectly spatially and temporally (up to varying motion blur) aligned by construction. In this work, we demonstrate that we successfully compensate for reduced spatial resolution and aliasing in such sensors, and we improve overall the quality and dynamic range of reconstructed HDR video with respect to single-exposure sensors for a given number of alternating exposures. Specifically, we consider low, mid, and high exposures, and we propose that the mid exposure is captured for every frame, and serves as a spatial and temporal reference. We capitalize here on neural networks for denoising, deblurring, and upsampling tasks, so that effectively we obtain two clean, sharp, and full-resolution exposures for every frame, which are then complemented by warping a missing third exposure. High-quality warping is achieved by learning optical flow that merges the individual flows found for each specific exposure. Such flow merging is instrumental in handling saturated/dis-occluded image regions, while dense temporal sampling of mid exposure improves motion quality reproduction between more sparsely sampled exposures. We demonstrate that by capturing only a limited amount of sensor-specific data and a novel use of histograms, instead of common parametric noise statistics, we are able to generate synthetic training data that lead to a better denoising and deblurring quality than can be achieved by existing state-of-the-art methods. As there is not enough high-quality HDR video available, we devise a method to learn from LDR video instead. Our approach compares favorably to several strong baselines, and can boost existing HDR image and video methods when they are re-trained on our data.
Read full abstract