Road extraction from a remote sensing image is a research hotspot due to its broad range of applications. Despite recent advancements, achieving precise road extraction remains challenging. Since a road is thin and long, roadside objects and shadows cause occlusions, thus influencing the distinguishment of the road. Masked image modeling reconstructs masked areas from unmasked areas, which is similar to the process of inferring occluded roads from nonoccluded areas. Therefore, we believe that mask image modeling is beneficial for indicating occluded areas from other areas, thus alleviating the occlusion issue in remote sensing image road extraction. In this paper, we propose a remote sensing image road extraction network named RemainNet, which is based on mask image modeling. RemainNet consists of a backbone, image prediction module, and semantic prediction module. An image prediction module reconstructs a masked area RGB value from unmasked areas. Apart from reconstructing original remote sensing images, a semantic prediction module of RemainNet also extracts roads from masked images. Extensive experiments are carried out on the Massachusetts Roads dataset and DeepGlobe Road Extraction dataset; the proposed RemainNet improves 0.82–1.70% IoU compared with other state-of-the-art road extraction methods.