Simulating the ozone variability at regional scales using chemistry transport models (CTMs) remains a challenge. We designed a multi-model intercomparison to evaluate, for the first time, four regional CTMs on a national scale for Germany. Simulations were conducted with LOTOS-EUROS, REM-CALGRID, COSMO-MUSCAT and WRF-Chem for January 1st to December 31st, 2019, using prescribed emission information. In general, all models show good performance in the operational evaluation with average temporal correlations of MDA8 O3 in the range of 0.77–0.87 and RMSE values between 16.3 μg m−3 and 20.6 μg m−3. On average, better models' skill has been observed for rural background stations than for the urban background stations as well as for springtime compared to summertime. Our study confirms that the ensemble mean provides a better model-measurement agreement than individual models. All models capture the larger local photochemical production in summer compared to springtime and observed differences between the urban and the rural background. We introduce a new indicator to evaluate the dynamic response of ozone to temperature. During summertime a large ensemble spread in the ozone sensitivities to temperature is found with (on average) an underestimation of the ozone sensitivity to temperature, which can be linked to a systematic underestimation of mid-level ozone concentrations. During springtime we observed an ozone episode that is not covered by the models which is likely due to deficiencies in the representation of background ozone in the models. We recommend to focus on a diagnostic evaluation aimed at the model descriptions for biogenic emissions and dry deposition as a follow up and to repeat the operational and dynamic analysis for longer timeframes.