A systematic methodology for model–data comparisons of sea surface temperature (SST) and mixed layer depth (MLD) is presented using inter-annual simulations from an ocean general circulation model (OGCM), the Naval Research Laboratory (NRL) Layered Ocean Model (NLOM), over 1980–1998. The model–data comparisons performed here are applicable to other OGCMs and are sufficiently detailed to allow easy comparison of results from other models with NLOM, including statistics at specific buoy locations in specific years. There are three model configurations: coarse resolution (1/2°global) and fine resolution (1/8° global and 1/16° Pacific). These are used to assess the sensitivity to OGCM resolution, including the impact of increasing non-determinicity due to flow instabilities in the 1/8° marginally eddy-resolving and 1/1°6 fully eddy-resolving simulations. In addition, sensitivity to the choice of atmospheric forcing product is investigated. For all three models the atmospheric forcing is from the European Centre for Medium-Range Weather Forecasts (ECMWF) during 1980–1998, and for the 1/16° Pacific configuration only the forcing from the Fleet Numerical Meteorology and Oceanography Center (FNMOC) Navy Operational Global Atmospheric Prediction System (NOGAPS) during 1990–1998 is also used. Availability of NOGAPS thermal forcing begins in 1998. Daily averaged buoy time series from the National Data Buoy Center (NDBC) and the Tropical Atmosphere Ocean (TAO) array are used for inter-annual evaluation of these atmospherically forced OGCMs with no assimilation of SST data and no date-specific assimilation of any data type. For the purpose of model evaluation several statistical metrics are calculated comparing buoy and model time series: mean error (ME), root-mean-square difference (RMSD), correlation coefficient ( R), non-dimensional skill score (SS), and normalized RMSD (NRMSD). SST comparisons to the 340 yearlong daily buoy SST time series spanning 1980–1998 gave median values of − 0.09 °C for ME, 0.82 °C for RMSD, 0.92 for R, and 0.73 for SS for the 1/2° global model. Positive SS values, an indication of model success, are found for 286 out of 340 buoys (≈ 84%). An advantage of using a floating mixed layer approach in the global model is demonstrated for simulation of deep and shallow MLD at a buoy location in the Arabian Sea during the 1994–1995 Monsoon period. Model–data comparisons for 1998, when the equatorial Pacific ocean experienced a sharp transition from a strong El Niño to a strong La Niña event, revealed almost no sensitivity to the choice of thermal forcing product. In general, the sensitivity of SST accuracy to model resolution or atmospheric forcing choice was also low based on comparisons to 194 yearlong daily SST time series during 1990–1998. The median RMSD values are 0.68°, 0.61°, 0.84° and 0.84 °C for 1/2°, 1/8°, 1/16° ECMWF-forced simulations and the 1/16° with NOGAPS wind forcing and ECMWF thermal forcing. However, some regional differences exist, e.g., median RMSD values of 0.85 °C for the ECMWF-forced vs. 0.88 °C for the NOGAPS-forced simulations against the 127 TAO buoys, and 0.76 °C for NOGAPS-forced vs. 0.84 °C for ECMWF-forced simulations against the 67 NDBC buoys, all outside the equatorial region. Overall, the results of this paper revealed that 6 hourly wind and thermal forcing products exist that are sufficiently accurate to allow simulated SSTs and MLDs in an OGCM that are typically accurate to within < 1 °C in comparison to daily buoy time series, when there is no assimilation of SST data and when model SST is used in the calculation of latent and sensible heat fluxes. These results indicate that the atmospheric forcing products and the OGCM are suitable for assimilation and forecasting of SST over the global ocean.