To fully explore the complementary information from optical and synthetic aperture radar (SAR) imageries, they need first to be coregistered with high accuracy. Due to the vast radiometric and geometric disparity, the problem to match high-resolution optical and SAR images is quite challenging. The present deep learning-based methods have shown advantages over the traditional approaches, but the performance increment is not significant. In this article, we explore a better network framework for high-resolution optical and SAR image matching from three aspects. First, we propose an effective multilevel feature fusion method, which helps to take advantage of both the low-level fine-grained features for precious feature location and the high-level semantic features for better discriminative ability. Second, a feature channel excitation procedure is conducted using a novel multifrequency channel attention module, which is able to make image features of different types and multiple levels effectively collaborate with each other and produce image matching features with high diversity. Third, the self-adaptive weighting loss is introduced, with which, each sample is assigned with an adaptive weighting factor, and therefore, information buried in all nearby samples can be better exploited. Under a pseudo-Siamese architecture, the proposed optical and SAR image matching network (OSMNet) is trained and tested on a large and diverse high-resolution optical and SAR dataset. Extensive experiments demonstrate that each component of the proposed deep framework helps to improve the matching accuracy. Also, the OSMNet shows overwhelming superior to the state-of-the-art handcrafted approaches on imageries of different land-cover types.