Deep learning algorithms have achieved state-of-the-art progress in synthetic aperture radar (SAR) automatic target recognition (ATR) tasks. They theoretically assume that training and test samples are independent and identically distributed (i.i.d.) for generalization, but it is intractable for practical ATR scenarios. In this paper, we propose a novel contrastive feature disentangling framework termed ConFeDent to learn features with improved generalization performance under a condition of a weaker distribution consistency. More specifically, ConFeDent aims to describe the semantic interactions between two arbitrary SAR training samples instead of treating them independently. It can implicitly disentangle features encoding the pose and identity knowledge from the whole samples with a semi-parametric geometric transformation model and a second-order energy model. In particular, except for the identity label, we use deductive-based geometry knowledge as supervision to teach the model to learn the concept of aspect angle variation. A progressively amortized inference scheme is constructed for efficient feature learning and recognition in an end-to-end manner. Finally, we further release a strengthened version, called ConFeDent+, which can explicitly utilize and learn more information from cross-category samples. Experimental results on the moving and stationary target acquisition and recognition (MSTAR) benchmark demonstrate the effectiveness of our proposed models in the SAR ATR. In particular, we validate the algorithms in a more challenging scenario where the range of aspect angles for training and testing samples is permitted to be disparate. Our model can achieve much higher recognition accuracy than other SAR ATR algorithms.