Abstract The complexity of background noise and the scarcity of real fault samples seriously affect the diagnostic accuracy of the model. To address this, a noise-robust two-dimensional feature map, the sparse frequency spiral spectrum (SFSM), based on sparse representation theory, is proposed. A bridge penalty coefficient is applied to the sparse representation model to accurately select impact components, and the fast iterative shrinkage threshold algorithm is used to solve for sparse representation coefficients. Sparse reconstructed signals are obtained by convolving the impact patterns with these coefficients, leading to a sparse reconstruction algorithm with reduced computational complexity. Furthermore, the novel non-linear activation-free blocks (NAF Blocks) are embedded into the latent diffusion model to augment small samples, significantly improving image generation speed and quality. The integration of the Swin transformer for feature extraction and classification further enhances diagnostic performance. The superiority of this method is validated on the XJTU-SY dataset, a bearing experimental platform dataset, and enterprise engineering dataset. Experimental results demonstrate that the structural and generalization advantages of NAF Blocks are crucial for improving image quality and inference speed. The noise suppression capability of the proposed method, facilitated by the SFSM feature processing technique, is confirmed through ablation and noise robustness tests. Finally, the Swin transformer’s excellent feature extraction and classification capabilities for SFSM are verified. The proposed method achieves diagnostic accuracies of 99.10% and 98.7% on the XJTU-SY and experimental platform datasets, respectively.