High-dynamic range imaging technology is an effective method to improve the limitations of a camera’s dynamic range. However, most current high-dynamic imaging technologies are based on image fusion of multiple frames with different exposure levels. Such methods are prone to various phenomena, for example motion artifacts, detail loss and edge effects. In this paper, we combine a dual-channel camera that can output two different gain images simultaneously, a semi-supervised network structure based on an attention mechanism to fuse multiple gain images is proposed. The proposed network structure comprises encoding, fusion and decoding modules. First, the U-Net structure is employed in the encoding module to extract important detailed information in the source image to the maximum extent. Simultaneously, the SENet attention mechanism is employed in the encoding module to assign different weights to different feature channels and emphasis important features. Then, a feature map extracted from the encoding module is input to the decoding module for reconstruction after fusing by the fusion module to obtain a fused image. Experimental results indicate that the fused images obtained by the proposed method demonstrate clear details and high contrast. Compared with other methods, the proposed method improves fused image quality relative to several indicators.