Image fusion obtains a desired image by integrating the useful information of multiple input images. Most traditional fusion strategy is usually guided by image local contrast or variance, which cannot well represent visual discernable features of source images. And the undesirable seam effects or artifacts produced due to the inconsistency between fusion weight map and image content may severely degrade the visual quality of the fused images. An efficient image fusion method with structural saliency measure and content adaptive consistency verification was proposed. The fusion is implemented under the nonsubsampled contourlet transform (NSCT)-based image fusion framework. The low-frequency NSCT decomposition coefficients are fused with the weight map constructed by considering both structural saliency and visual uniqueness features and refined by spatial consistency with guide filter. The high-frequency NSCT decomposition coefficients are fused with structural saliency. The performances of the proposed method have been verified on several pairs of multifocus images, infrared-visible images, and multimodal medical images. Experimental results clearly demonstrate the superiority of the proposed algorithm compared with several existing state-of-the-art algorithms in terms of both visual and quantitative comparison.