Abstract. The National Air Quality Forecast Capability (NAQFC) project provides the US with operational and experimental real-time ozone predictions using two different versions of the three-dimensional Community Multi-scale Air Quality (CMAQ) modeling system. Routine evaluation using near-real-time AIRNow ozone measurements through 2011 showed better performance of the operational ozone predictions. In this work, quality-controlled and -assured Air Quality System (AQS) ozone and nitrogen dioxide (NO2) observations are used to evaluate the experimental predictions in 2010. It is found that both ozone and NO2 are overestimated over the contiguous US (CONUS), with annual biases of +5.6 and +5.1 ppbv, respectively. The annual root mean square errors (RMSEs) are 15.4 ppbv for ozone and 13.4 ppbv for NO2. For both species the overpredictions are most pronounced in the summer. The locations of the AQS monitoring sites are also utilized to stratify comparisons by the degree of urbanization. Comparisons for six predefined US regions show the highest annual biases for ozone predictions in Southeast (+10.5 ppbv) and for NO2 in the Lower Middle (+8.1 ppbv) and Pacific Coast (+7.1 ppbv) regions. The spatial distributions of the NO2 biases in August show distinctively high values in the Los Angeles, Houston, and New Orleans areas. In addition to the standard statistics metrics, daily maximum eight-hour ozone categorical statistics are calculated using the current US ambient air quality standard (75 ppbv) and another lower threshold (70 ppbv). Using the 75 ppbv standard, the hit rate and proportion of correct over CONUS for the entire year are 0.64 and 0.96, respectively. Summertime biases show distinctive weekly patterns for ozone and NO2. Diurnal comparisons show that ozone overestimation is most severe in the morning, from 07:00 to 10:00 local time. For NO2, the morning predictions agree with the AQS observations reasonably well, but nighttime concentrations are overpredicted by around 100%.