Rice is one of the world’s three major food crops, second only to sugarcane and corn in output. Timely and accurate rice extraction plays a vital role in ensuring food security. In this study, R-Unet for rice extraction was proposed based on Sentinel-2 and time-series Sentinel-1, including an attention-residual module and a multi-scale feature fusion (MFF) module. The attention-residual module deepened the network depth of the encoder and prevented information loss. The MFF module fused the high-level and low-level rice features at channel and spatial scales. After training, validation, and testing on seven datasets, R-Unet performed best on the test samples of Dataset 07, which contained optical and synthetic aperture radar (SAR) features. Precision, intersection, and union (IOU), F1-score, and Matthews correlation coefficient (MCC) were 0.948, 0.853, 0.921, and 0.888, respectively, outperforming the baseline models. Finally, the comparative analysis between R-Unet and classic models was completed in Dataset 07. The results showed that R-Unet had the best rice extraction effect, and the highest scores of precision, IOU, MCC, and F1-score were increased by 5.2%, 14.6%, 11.8%, and 9.3%, respectively. Therefore, the R-Unet proposed in this study can combine open-source sentinel images to extract rice timely and accurately, providing important information for governments to implement decisions on agricultural management.