Precise gamma-ray spectral analysis is crucial in high-stakes applications, such as nuclear security. Research efforts toward implementing machine learning (ML) approaches for accurate analysis are limited by the resemblance of the training data to the testing scenarios. The underlying spectral shape of synthetic data may not perfectly reflect measured configurations, and measurement campaigns may be limited by resource constraints. Consequently, ML algorithms for isotope identification must maintain accurate classification performance under domain shifts between the training and testing data. To this end, four different classifiers (Ridge, Random Forest, Extreme Gradient Boosting, and Multilayer Perceptron) were trained on the same dataset and evaluated on twelve other datasets with varying standoff distances, shielding, and background configurations. A tailored statistical approach was introduced to quantify the similarity between the training and testing configurations, which was then related to the predictive performance. Wilcoxon signed-rank tests revealed that the OVR-wrapped XGB significantly outperformed the other algorithms, with confidence levels of 99.0% or above for the 133Ba, 60Co, 137Cs, and 152Eu sources. The findings from this work are significant as they outline techniques to promote the development of robust ML-based approaches for isotope identification.