ABSTRACT In recent years, high resolution (HR) remote sensing images have brought significant changes to land cover monitoring. Monitoring water resources, a vital and scarce commodity for human survival, is a crucial aspect of land cover monitoring and forms the foundation for water resource allocation in regions facing shortages. Hence, a universal method is necessary to address this problem and handle large-scale water resources monitoring. We explore the network structure via improving the interpretability to reduce the network layers and propose a lightweight pixel-wise semantic segmentation model, which achieves waterbody extraction from GF-1 remote sensing images with high accuracy and high speed. The proposed model structure addresses both semantic segmentation for large and small waterbodies, extracting fine-grained edge and contour information from high-resolution feature maps. Additionally, it efficiently extracts multi-scale high-level semantic information using residual convolutional blocks and dilated convolutional blocks. The multi-scale feature maps are fused, and binary classification is predicted through Support Vector Machines (SVM). Furthermore, the paper introduces a model training method with an adaptive learning rate, reducing the overall training time. To validate the model’s performance, remote sensing images from GF-1 are utilized to construct a dataset. Experimental results, compared with five models, demonstrate that the proposed method achieves the highest accuracy and least training time.