Deep Learning (DL) model has been widely used in the field of Synthetic Aperture Radar Automatic Target Recognition (SAR-ATR) and has achieved excellent performance. However, the black-box nature of DL models has been the focus of criticism, especially in the application of SAR-ATR, which is closely associated with the national defense and security domain. To address these issues, a new interpretable recognition model Physics-Guided BagNet (PGBN) is proposed in this article. The model adopts an interpretable convolutional neural network framework and uses time–frequency analysis to extract physical scattering features in SAR images. Based on the physical scattering features, an unsupervised segmentation method is proposed to distinguish targets from the background in SAR images. On the basis of the segmentation result, a structure is designed, which constrains the model’s spatial attention to focus more on the targets themselves rather than the background, thereby making the model’s decision-making more in line with physical principles. In contrast to previous interpretable research methods, this model combines interpretable structure with physical interpretability, further reducing the model’s risk of error recognition. Experiments on the MSTAR dataset verify that the PGBN model exhibits excellent interpretability and recognition performance, and comparative experiments with heatmaps indicate that the physical feature guidance module presented in this article can constrain the model to focus more on the target itself rather than the background.