Abstract In this work, the application of the online-coupled Weather Research and Forecasting model with chemistry (WRF/Chem) version 3.3.1 is evaluated over East Asia for January, April, July, and October 2005 and compared with results from a previous application of an offline model system, i.e., the Mesoscale Model and Community Multiple Air Quality modeling system (MM5/CMAQ). The evaluation of WRF/Chem is performed using multiple observational datasets from satellites and surface networks in mainland China, Hong Kong, Taiwan, and Japan. WRF/Chem simulates well specific humidity (Q2) and downward longwave and shortwave radiation (GLW and GSW) with normalized mean biases (NMBs) within 24%, but shows moderate to large biases for temperature at 2-m (T2) (NMBs of −9.8% to 75.6%) and precipitation (NMBs of 11.4–92.7%) for some months, and wind speed at 10-m (WS10) (NMBs of 66.5–101%), for all months, indicating some limitations in the YSU planetary boundary layer scheme, the Purdue Lin cloud microphysics, and the Grell–Devenyi ensemble scheme. WRF/Chem can simulate the column abundances of gases reasonably well with NMBs within 30% for most months but moderately to significantly underpredicts the surface concentrations of major species at all sites in nearly all months with NMBs of −72% to −53.8% for CO, −99.4% to −61.7% for NOx, −84.2% to −44.5% for SO2, −63.9% to −25.2% for PM2.5, and −68.9% to 33.3% for PM10, and aerosol optical depth in all months except for October with NMBs of −38.7% to −16.2%. The model significantly overpredicts surface concentrations of O3 at most sites in nearly all months with NMBs of up to 160.3% and NO 3 - at the Tsinghua site in all months. Possible reasons for large underpredictions include underestimations in the anthropogenic emissions of CO, SO2, and primary aerosol, inappropriate vertical distributions of emissions of SO2 and NO2, uncertainties in upper boundary conditions (e.g., for O3 and CO), missing or inaccurate model representations (e.g., secondary organic aerosol formation, gas/particle partitioning, dust emissions, dry and wet deposition), and inaccurate meteorological fields (e.g., overpredictions in WS10 and precipitation, but underpredictions in T2), as well as the large uncertainties in satellite retrievals (e.g., for column SO2). Comparing to MM5, WRF generally gives worse performance in meteorological predictions, in particular, T2, WS10, GSW, GLW, and cloud fraction in all months, as well as Q2 and precipitation in January and October, due to limitations in the above physics schemes or parameterizations. Comparing to CMAQ, WRF/Chem performs better for surface CO, O3, and PM10 concentrations at most sites in most months, column CO and SO2 abundances, and AOD. It, however, gives poorer performance for surface NOx concentrations at most sites in most months, surface SO2 concentrations at all sites in all months, and column NO2 abundances in January and April. WRF/Chem also gives lower concentrations of most secondary PM and black carbon. Those differences in results are attributed to differences in simulated meteorology, gas-phase chemistry, aerosol thermodynamic and dynamic treatments, dust and sea salt emissions, and wet and dry deposition treatments in both models.