As a required step in optical remote sensing applications, image matching identifies correspondences to estimate the relationship between two images. To address this task, feature-based algorithms, such as Scale-Invariant Feature Transform (SIFT), use detectors to identify keypoints and then apply descriptors to represent these keypoints as feature vectors. Thereby, these vectors from different images are matched by Euclidean distance to produce matching points as correspondences. Deep learning networks are widely used in the design of detectors and descriptors due to their powerful representation capabilities. However, previous methods oriented to repeatability tend to be of low localization accuracy. To solve this problem, we define optimal keypoints as “detected, repeatable, reasonable, and distinguishable” (DRRD). Based on this idea, we introduce a DRRD framework for remote sensing optical image matching. Specifically, we designed a deep-learning-based descriptor network which is trained in a self-supervised manner with a novel adaptive mix content triplet (AMCT) loss function. We established a training procedure for remote sensing images using a multi-spectral image dataset. The proposed method obtained 1694 correct correspondences from 8192 detected keypoints in multi-temporal image matching tasks and a sub-pixel accuracy in the image rectification experiments. Comparison experiments show that the proposed DRRD framework outperforms other state-of-the-art methods.