Optical and SAR image registration is the primary procedure to exploit the complementary information from the two different image modal types. Although extensive research has been conducted to narrow down the vast radiometric and geometric gaps so as to extract homogeneous characters for feature point matching, few works have considered the registration issue for non-flat terrains, which will bring in more difficulties for not only sparse feature point matching but also outlier removal and geometric relationship estimation. This article addresses these issues with a novel and effective optical-SAR image registration framework. Firstly, sparse feature points are detected based on the phase congruency moment map of the textureless SAR image (SAR-PC-Moment), which helps to identify salient local regions. Then a template matching process using very large local image patches is conducted, which increases the matching accuracy by a significant margin. Secondly, a mutual verification-based initial outlier removal method is proposed, which takes advantage of the different mechanisms of sparse and dense matching and requires no geometric consistency assumption within the inliers. These two procedures will produce a putative correspondence feature point (CP) set with a low outlier ratio and high reliability. In the third step, the putative CPs are used to segment the large input image of non-flat terrain into dozens of locally flat areas using a recursive random sample consensus (RANSAC) method, with each locally flat area co-registered using an affine transformation. As for the mountainous areas with sharp elevation variations, anchor CPs are first identified, and then optical flow-based pixelwise dense matching is conducted. In the experimental section, ablation studies using four precisely co-registered optical-SAR image pairs of flat terrain quantitatively verify the effectiveness of the proposed SAR-PC-Moment-based feature point detector, big template matching strategy, and mutual verification-based outlier removal method. Registration results on four 1 m-resolution non-flat image pairs prove that the proposed framework is able to produce robust and quite accurate registration results.