Multimodal change detection (MCD) harnesses multi-source remote sensing data to identify surface changes, thereby presenting prospects for applications within disaster management and environmental surveillance. Nonetheless, disparities in imaging mechanisms across various modalities impede the direct comparison of multimodal images. In response, numerous methodologies employing deep learning features have emerged to derive comparable features from such images. Nevertheless, several of these approaches depend on manually labeled samples, which are resource-intensive, and their accuracy in distinguishing changed and unchanged regions is not satisfactory. In addressing these challenges, a new MCD method based on iterative optimization-enhanced contrastive learning is proposed in this paper. With the participation of positive and negative samples in contrastive learning, the deep feature extraction network focuses on extracting the initial deep features of multimodal images. The common projection layer unifies the deep features of two images into the same feature space. Then, the iterative optimization module expands the differences between changed and unchanged areas, enhancing the quality of the deep features. The final change map is derived from the similarity measurements of these optimized features. Experiments conducted across four real-world multimodal datasets, benchmarked against eight well-established methodologies, incontrovertibly illustrate the superiority of our proposed approach.