Climate model evaluation work has made progress both in theory and practice, providing strong support for better understanding and predicting climate change. However, at the weather scale, there is relatively little assessment of climate models in terms of daily-scale climate phenomena, such as storm frequency and intensity. These weather-scale variables are of significant importance for our understanding of the impacts of climate change. In order to assess the capability of climate models to simulate weather-scale climate patterns, this study employs Self-Organizing Maps (SOMs) for weather pattern classification. By combining different evaluation metrics, varying the number of SOM types, changing the size of the study area, and altering the reference datasets, the climate models are evaluated to ensure the robustness of the assessment results. The results demonstrate that the size of the study area is positively correlated with observed differences, and there are correlations among different evaluation metrics. The highest correlation is observed between evaluation metrics in large-scale and small-scale spatial domains, while the correlation with SOM size is relatively low. This suggests that the choice of evaluation metrics has a minor impact on model ranking. Furthermore, when comparing the correlation coefficients calculated using the same evaluation metrics for different-sized regions, a significant positive correlation is observed. This indicates that variations in the size of the study area do not significantly affect model ranking. Further investigation of the relationship between model performance and different SOM sizes shows a significant positive correlation. The impact of dataset selection on model ranking is also compared, revealing high consistency. This enhances the reliability of model ranking. Taking into account the influence of evaluation metric selection, SOM size, and reanalysis data selection on model performance assessment, significant variations in model ranking are observed. Based on cumulative ranking, the top five models identified are ACCESS1-0, GISS-E2-R, GFDL-CM3, MIROC4h, and GFDL-ESM2M. In conclusion, factors such as evaluation metric selection, study area size, and SOM size should be considered when assessing model ranking. Weather pattern classification plays a crucial role in climate model evaluation, as it helps us better understand model performance in different weather systems, assess their ability to simulate extreme weather events, and improve the design and evaluation methods of model ensemble predictions. These findings are of great significance for optimizing and strengthening climate model evaluation methods and provide valuable insights for future research.