Abstract. This study is conducted in the framework of the Air Quality Modelling Evaluation International Initiative (AQMEII) and aims at the operational evaluation of an ensemble of 12 regional-scale chemical transport models used to predict air quality over the North American (NA) and European (EU) continents for 2006. The modelled concentrations of ozone and CO, along with the meteorological fields of wind speed (WS) and direction (WD), temperature (T), and relative humidity (RH), are compared against high-quality in-flight measurements collected by instrumented commercial aircraft as part of the Measurements of OZone, water vapour, carbon monoxide and nitrogen oxides by Airbus In-service airCraft (MOZAIC) programme. The evaluation is carried out for five model domains positioned around four major airports in NA (Portland, Philadelphia, Atlanta, and Dallas) and one in Europe (Frankfurt), from the surface to 8.5 km. We compare mean vertical profiles of modelled and measured variables for all airports to compute error and variability statistics, perform analysis of altitudinal error correlation, and examine the seasonal error distribution for ozone, including an estimation of the bias introduced by the lateral boundary conditions (BCs). The results indicate that model performance is highly dependent on the variable, location, season, and height (e.g. surface, planetary boundary layer (PBL) or free troposphere) being analysed. While model performance for T is satisfactory at all sites (correlation coefficient in excess of 0.90 and fractional bias ≤ 0.01 K), WS is not replicated as well within the PBL (exhibiting a positive bias in the first 100 m and also underestimating observed variability), while above 1000 m, the model performance improves (correlation coefficient often above 0.9). The WD at NA airports is found to be biased in the PBL, primarily due to an overestimation of westerly winds. RH is modelled well within the PBL, but in the free troposphere large discrepancies among models are observed, especially in EU. CO mixing ratios show the largest range of modelled-to-observed standard deviations of all the examined species at all heights and for all airports. Correlation coefficients for CO are typically below 0.6 for all sites and heights, and large errors are present at all heights, particularly in the first 250 m. Model performance for ozone in the PBL is generally good, with both bias and error within 20%. Profiles of ozone mixing ratios depend strongly on surface processes, revealed by the sharp gradient in the first 2 km (10 to 20 ppb km−1). Modelled ozone in winter is biased low at all locations in the NA, primarily due to an underestimation of ozone from the BCs. Most of the model error in the PBL is due to surface processes (emissions, transport, photochemistry), while errors originating aloft appear to have relatively limited impact on model performance at the surface. Suggestions for future work include interpretation of the model-to-model variability and common sources of model bias, and linking CO and ozone bias to the bias in the meteorological fields. Based on the results from this study, we suggest possible in-depth, process-oriented and diagnostic investigations to be carried out next.