While existing models have achieved significant progress on fault diagnosis tasks, what interpretable temporal dependencies they capture to resist noise remains unclear. Previous studies have ignored the inherent non-stationary property of fault data, which is the basis for capturing reliable temporal dependencies under heavy noise operating scenarios. To tackle the aforementioned problems, we propose a non-stationary perception network (NPFormer). Firstly, we revisit the large kernel design in the early stage of the network, and we demonstrate that this can make the attention map more stable for modeling non-stationary series, which can be a more powerful paradigm with heavy noise interference. Secondly, an adaptive non-stationarity embedding block is developed, which breaks the raw data into multiple components and adaptively attenuates the non-stationarity of mechanical signals for better predictability. Finally, we design multi-scale fusion attention to capture fine-grained features with the global scope, while achieving localized spatial–temporal correlations. Extensive experiments have been conducted on three public datasets and the practical aero-engine engineering application, showing that NPFormer significantly outperforms existing state-of-the-art methods. The weight information is visualized to reveal how NPFormer works, which can demonstrate that the stable attention map is a clear clue of anti-noise. We quantitatively reveal the effect of noise on the model, and the model’s anti-noise clues, during model inference. Code and models will be available for further study.