In agricultural environments, recognizing the walking ground and state of tracked mobile robots is a complex and challenging task, influenced by clay conditions and other external environmental disturbances. Therefore, this paper proposes a novel data processing method and efficient classifier. Firstly, the noise signals on the left and right sides of the collected robot are averaged, and a time-wavelet-time domain transformation is performed using the Mallat algorithm to achieve nonlinear enhancement of data features and signal denoising. Secondly, the Gramian Angular Summation Fields (GASF) is introduced to transform sequence data into single-channel images, capturing the periodicity and similarity of time series. Next, the images of three sets of sequences are stacked in the channel dimension in RGB format, thus achieving feature fusion of multi-source data. Finally, a supervised learning classifier named Attention-fused Residual Convolutional Neural Network (ANR-CNN) is proposed. Here, the combination of channel and spatial attention mechanisms captures important features in the feature map in both channel and spatial dimensions. The convolutional residual structure enhances feature transmission, improving the model's classification accuracy. Experimental results demonstrate that the proposed data augmentation method effectively enhances model performance, and the classification accuracy of ANR-CNN reaches 92.35%. This implies accurate recognition of the walking ground and state of tracked mobile robots in agricultural environments.