Abstract Power spectrum estimators are an important tool in efforts to detect the 21 cm brightness temperature fluctuations from neutral hydrogen at early times. An initial detection will likely be statistical in nature, meaning that it will not be possible to make a coherent map of the brightness temperature fluctuations; instead, only their variance will be measured against a background of noise and residual systematic effects. Optimal Quadratic Estimator (OQE)-based methods often apply an inverse covariance weighting to the data. However, inaccurate covariance modelling can lead to reduced sensitivity and, in some cases, severe signal loss. We recently proposed a Bayesian method to jointly estimate the 21 cm fluctuations, their power spectrum, and foreground emission. Instead of requiring a fixed a priori estimate of the covariance, we estimate the covariance as part of the inference. Choices of parametrization, particularly of the foregrounds, are subject to model errors and could lead to biases and other ill effects if not properly controlled. In this paper, we investigate the effects of inaccurate foreground models on 21 cm power spectrum recovery. Using simulated visibilities, we find that, even in the most extreme scenarios tested, our approach is capable of recovering 21 cm delay power spectrum estimates consistent with a known input signal for delays ≳ 300 ns (∼88% of the available Fourier modes). This is true even when using foreground models derived from modified foreground catalogs containing spatial and spectral perturbations at the quoted level of uncertainty on our foreground catalogs.