Fine-grained management of rice fields can enhance the yield and quality of rice crops. Challenges in achieving fine classification include interference from similar vegetation, the irregularity of natural field shapes, and complex scale variations. This paper introduces Rice Attention Cascade Network (RACNet), for the fine classification of rice fields in high-resolution satellite remote sensing imagery. The network employs the Hybrid Task Cascade network as the base framework and uses spectral and indices mixed multimodal data as input to reinforce the feature differentiation of similar vegetation. Initially, a Channel Attention Deformable-ResNet (CAD-ResNet) was designed to enhance the feature representation of rice on different channels. Deformable convolution improves the ability of CAD-ResNet to capture irregular field shapes. Then, to address the issue of complex scale changes, the multi-scale features extracted by the CAD-ResNet are progressively fused using an Asymptotic Feature Pyramid, reducing the loss of scale information between non-adjacent layers. Experiments on the Meishan rice dataset show that the proposed method is capable of accurate instance segmentation for fragmented or irregularly shaped rice fields. The evaluation metric AP50 of RACNet reaches 50.8%.