Synthesizing the high dynamic range (HDR) image from multi-exposure images has been extensively studied by exploiting convolutional neural networks (CNNs) recently. Despite the remarkable progress, existing CNN-based methods have the intrinsic limitation of local receptive field, which hinders the model’s capability of capturing long-range correspondence and large motions across under/over-exposure images, resulting in ghosting artifacts of dynamic scenes. To address the above challenge, we propose a novel Ed ge-gu i ded T ransf or mer framework (EdiTor) customized for ghost-free HDR reconstruction, where the long-range motions across different exposures can be delicately modeled by incorporating the edge prior. Specifically, EdiTor calculates patch-wise correlation maps on both image and edge domains, enabling the network to effectively model the global movements and the fine-grained shifts across multiple exposures. Based on this framework, we further propose an exposure-masked loss to adaptively compensate for the severely distorted regions ( e.g. , highlights and shadows). Experiments demonstrate that EdiTor outperforms state-of-the-art methods both quantitatively and qualitatively, achieving appealing HDR visualization with unified textures and colors.
Read full abstract