ABSTRACT Accurate crop mapping provides important information for government decision-making and agricultural management. Recent advancements in deep learning have significantly enhanced the capabilities of remote sensing-based crop mapping. In this study, we developed a new deep learning approach, DSH, which comprises three modules, DeepLabV3+ (D), channel self-attention (S), and histogram matching (H). The DeepLabV3+ module learns the spatial distribution of crops in the spatial and spectral dimensions. The channel self-attention module was used to enhance the weights of the important features. The histogram-matching module addresses the domain gap between the training and testing images and enhances the transferability of the DSH. In addition, the input data of the DSH are monthly synthesized Sentinel-2 data, thus relaxing the data collection requirements. The performance of the DSH was evaluated and benchmarked against HRNetV2, DeepLabV3+, and random forest (RF) at eight sites in the U.S. China, and France with different farming structures, climate conditions, and landscape complexities. Temporal transfer revealed that the classification performance of DSH improved as the growth season progressed, peaked in the peak growth season (August), and then declined at each study site. The DSH model trained during the peak growth season demonstrated superior temporal transferability. Spatial transfer experiments demonstrated that transferring DSH to sites with similar planting structures achieved higher precision than full spatial transfers. Spatiotemporal transfer experiments conducted at eight sites over three years (2020–2022) demonstrated an average overall accuracy (OA) of 88.1% for DSH, highlighting the importance of similar planting structures for effective spatiotemporal transfer. DSH outperformed HRNetV2, DeepLabV3+, and RF in OA, producer’s accuracy, and user’s accuracy at all eight study sites. Notably, DSH exhibited comparable performance during both the peak and full growth seasons, with an average OA of 93.9%. The classification accuracy of test images after histogram matching remained stable from 2020 to 2022, whereas the accuracy of raw test images fluctuated annually. This indicates that histogram matching is effective in improving the transferability of DSH. Embedding the channel self-attention module into the position of the high-level features had a more significant impact on improving DSH’s performance than other configurations. Visualizing features from the channel self-attention and high-level layers revealed that integrating the channel self-attention module with DeepLabV3+‘s high-level features significantly improved crop mapping by enhancing relevant semantic information. Overall, DSH exhibits robust spatiotemporal generalizability and requires minimal input data, making it suitable for large-scale crop mapping.