Abstract. The observed warming in the Arctic is more than double the global average, and this enhanced Arctic warming is projected to continue throughout the 21st century. This rapid warming has a wide range of impacts on polar and sub-polar marine ecosystems. One of the examples of such an impact on ecosystems is that of coccolithophores, particularly Emiliania huxleyi, which have expanded their range poleward during recent decades. The coccolithophore E. huxleyi plays an essential role in the global carbon cycle. Therefore, the assessment of future changes in coccolithophore blooms is very important. Currently, there are a large number of climate models that give projections for various oceanographic, meteorological, and biochemical variables in the Arctic. However, individual climate models can have large biases when compared to historical observations. The main goal of this research was to select an ensemble of climate models that most accurately reproduces the state of environmental variables that influence the coccolithophore E. huxleyi bloom over the historical period when compared to reanalysis data. We developed a novel approach for model selection to include a diverse set of measures of model skill including the spatial pattern of some variables, which had not previously been included in a model selection procedure. We applied this method to each of the Arctic and sub-Arctic seas in which E. huxleyi blooms have been observed. Once we have selected an optimal combination of climate models that most skilfully reproduce the factors which affect E. huxleyi, the projections of the future conditions in the Arctic from these models can be used to predict how E. huxleyi blooms will change in the future. Here, we present the validation of 34 CMIP5 (fifth phase of the Coupled Model Intercomparison Project) atmosphere–ocean general circulation models (GCMs) over the historical period 1979–2005. Furthermore, we propose a procedure of ranking and selecting these models based on the model's skill in reproducing 10 important oceanographic, meteorological, and biochemical variables in the Arctic and sub-Arctic seas. These factors include the concentration of nutrients (NO3, PO4, and SI), dissolved CO2 partial pressure (pCO2), pH, sea surface temperature (SST), salinity averaged over the top 30 m (SS30 m), 10 m wind speed (WS), ocean surface current speed (OCS), and surface downwelling shortwave radiation (SDSR). The validation of the GCMs' outputs against reanalysis data includes analysis of the interannual variability, seasonal cycle, spatial biases, and temporal trends of the simulated variables. In total, 60 combinations of models were selected for 10 variables over six study regions using the selection procedure we present here. The results show that there is neither a combination of models nor one model that has high skill in reproducing the regional climatic-relevant features of all combinations of the considered variables in target seas. Thereby, an individual subset of models was selected according to our model selection procedure for each combination of variable and Arctic or sub-Arctic sea. Following our selection procedure, the number of selected models in the individual subsets varied from 3 to 11. The paper presents a comparison of the selected model subsets and the full-model ensemble of all available CMIP5 models to reanalysis data. The selected subsets of models generally show a better performance than the full-model ensemble. Therefore, we conclude that within the task addressed in this study it is preferable to employ the model subsets determined through application of our procedure than the full-model ensemble.