Optical and infrared imaging is often used in ground-based optical space target observation. The fusion of the two types of images for a more detailed observation is the key problem to be solved. A space target multimodal image fusion scheme based on the joint sparsity model, which takes the correlations among the native sparse characteristics of the image, clarity features of the image, and multisource images into consideration, is proposed. First, using an overcomplete dictionary, the source images are represented as a combination of a shared sparse component and exclusive sparse components. Second, a method for image clarity feature extraction is proposed to design the fusion rules of exclusive sparse components to obtain the fused exclusive sparse components. Finally, the fused image is reconstructed with the fused sparse components and overcompleted dictionary. The proposed method was tested on the space target image and nature scene image data sets. Compared with traditional methods such as the multiscale transform-based methods, sparse representation-based methods, and joint sparsity representation-based methods, the final experimental results demonstrated that our method outperforms the existing state-of-the-art methods on the human visual effect and the objective evaluation indexes. In particular, for the evaluation indexes Q A B / F and Q E , the scores increase to nearly 10% more than those for traditional methods, which indicates that the fused image of our method has better edge clarity.