Abstract. Complex geometric distortions and nonlinear radiation differences between optical and synthetic aperture radar (SAR) images present challenges for the matching of sufficient and evenly distributed corresponding points. To address this problem, this paper proposes a deep convolutional network based on an attention mechanism for matching optical and SAR images. In order to obtain robust feature points, we employ phase consistency instead of image intensity and gradient information for feature detection. A deep convolutional network (DCN) is designed to extract high-level semantic features between optical and SAR images, providing robustness to geometric distortion and nonlinear radiation changes. Notably, incorporating multiple inverted residual structures in the DCN facilitates efficient extraction of local and global features, promoting feature reuse, and reducing the loss of key features. Furthermore, a dense feature fusion module based on coordinate attention is designed, focusing on the spatial positional information of effective features, integrating key features into deep descriptors to enhance the robustness of deep descriptors to nonlinear radiometric differences. A coarse-to-fine strategy is then employed to enhance accuracy by eliminating mismatches. Experimental results demonstrate that the proposed network performs better than the manually designed descriptors-based methods and the stateof- the-art deep learning networks in both matching effectiveness and accuracy. Specifically, the number of matches achieved is approximately 2 times greater than that of other methods, with a 10% improvement in F-measure.