Results are described from a series of 40 retrospective forecasts of tropical Pacific SST, starting 1 January and 1 July 1980–99, performed with several coupled ocean–atmosphere general circulation models sharing the same ocean model—the Modular Ocean Model version 3 (MOM3) OGCM—and the same initial conditions. The atmospheric components of the coupled models were the Center for Ocean–Land–Atmosphere Studies (COLA), ECHAM, and Community Climate Model version 3 (CCM3) models at T42 horizontal resolution, and no empirical corrections were applied to the coupling. Additionally, the retrospective forecasts using the COLA and ECHAM atmospheric models were carried out with two resolutions of the OGCM. The high-resolution version of the OGCM had 1° horizontal resolution (1/3° meridional resolution near the equator) and 40 levels in the vertical, while the lower-resolution version had 1.5° horizontal resolution (1/2° meridional resolution near the equator) and 25 levels. The initial states were taken from an ocean data assimilation performed by the Geophysical Fluid Dynamics Laboratory (GFDL) using the high-resolution OGCM. Initial conditions for the lower-resolution retrospective forecasts were obtained by interpolation from the GFDL ocean data assimilation. The systematic errors of the mean evolution in the coupled models depend strongly on the atmospheric model, with the COLA versions having a warm bias in tropical Pacific SST, the CCM3 version a cold bias, and the ECHAM versions a smaller cold bias. Each of the models exhibits similar levels of skill, although some statistically significant differences are identified. The models have better retrospective forecast performance from the 1 July initial conditions, suggesting a spring prediction barrier. A consensus retrospective forecast produced by taking the ensemble average of the retrospective forecasts from all of the models is generally superior to any of the individual retrospective forecasts. One reason that averaging across models appears to be successful is that the averaging reduces the effects of systematic errors in the structure of the ENSO variability of the different models. The effect of reducing noise by averaging ensembles of forecasts made with the same model is compared to the effects from multimodel ensembling for a subset of the cases; however, the sample size is not large enough to clearly distinguish between the multimodel consensus and the single-model ensembles. There are obvious problems with the retrospective forecasts that can be connected to the various systematic errors of the coupled models in simulation mode, and which are ultimately due to model error (errors in the physical parameterizations and numerical truncation). These errors lead to initial shock and a “spring variability barrier” that degrade the retrospective forecasts.