Plenoptic cameras can record both spatial and angular information of incident rays as 4D light field (LF) images, which have unique advantages in a wide range of computer vision and graphics applications. However, plenoptic cameras usually suffer from image quality degradation due to limited resolution, very small sub-apertures for sub-views, improper exposure and color quantization of image sensors. Raw macro-pixel LF images captured by plenoptic cameras are usually decomposed into an array of sub-views, during which decomposition and improper correction would further damage the quality of LF images. Therefore, the sub-views of an LF image always have tricky problems of low dynamic range, brightness reduction, color deviation and missing textural details in improper exposure areas due to the small sub-aperture for each sub-view. We observed that it is hard to tell that the brightness (tone) ranges of DSLR (Digital Single Lens Reflex) Camera images are always better than that of LF images even captured from the same real-world scenes. Thus, instead of directly taking the accompanying DSLR images as ground truths for enhancing LF images, we propose an unsupervised neural network, called LFIENet, for properly fusing the exposures of LF-DSLR image pairs. With the help of corresponding DSLR images, the enhanced LF images would contain much abundant textural details and have extended dynamic ranges and better contrast. Since histogram equalization enhancement for brightness range is able to extend dynamic range and improve image contrast, we propose a Histogram Equalization Attention Module (HEAM) to discover the over/under-exposed areas for properly fusing LF-DSLR image pairs. In addition, for learning the proposed neural network, we propose a real-world LF-DSLR image pair dataset. Extensive experiments on various challenging real-world LF images demonstrate the effectiveness of our network.
Read full abstract