Infrared and visible image fusion is an effective method to solve the lack of single sensor imaging. The purpose is that the fusion images are suitable for human eyes and conducive to the next application and processing. In order to solve the problems of incomplete feature extraction, loss of details, and less samples of common data sets, it is not conducive to training, an end-to-end network architecture for image fusion is proposed. U-net is introduced into image fusion, and the final fusion result is obtained by using the generative adversarial network. Through its special convolution structure, the important feature information is extracted to the maximum extent, and the sample does not need to be cut to avoid the problem of reducing the fusion accuracy, but also to improve the training speed. Then the U-net extracted feature is confronted with the discriminator containing infrared image, and the generator model is obtained. The experimental results show that the present algorithm can obtain the fusion image with clear outline, prominent texture and obvious target. SD, SF, SSIM, AG and other indicators are obviously improved.