ABSTRACT Measurement invariance of psychological test batteries is an essential quality criterion when the test batteries are administered in different cultural and language contexts. The purpose of this study was to examine to what extent measurement model fit and measurement invariance across the two largest language groups in Switzerland (i.e., German and French speakers) can be assumed for selected general mental ability and personality tests used in the Swiss Armed Forces’ cadre selection process. For the model fit and invariance testing, we used Bayesian structural equation modeling (BSEM). Because the sizes of the language group samples were unbalanced, we reran the invariance testing with the subsampling procedure as a robustness check. The results showed that at least partial approximate scalar invariance can be assumed for the constructs. However, comparisons in the full sample and subsamples also showed that certain test items function differently across the language groups. The results are discussed regarding the three following issues: First, we critically discuss the applied criterion and alternative effect size measures for assessing the practical importance of non-invariances. Second, we highlight potential remedies and further testing options, that can be applied, once certain items have been detected to function differently. Third, we discuss alternative modeling and invariance testing approaches to BSEM and outline future research avenues.