This paper describes a prototype model evaluation system for the study of the capabilities and performance of operational atmospheric transport and dispersion models. This system provides tools both for objective statistical analysis using common performance measures and for more subjective visualisation of the temporal and spatial relationships of model results relative to field measurements. Supporting this system is a database of processed field experiment data (source terms and meteorological and tracer measurements) from over 100 individual tracer releases. The use of this system is illustrated using results from models that are used operationally by the Atmospheric Release Advisory Capability (ARAC) emergency response system, as well as new models currently being developed.