Abstract Background Cardiogenic shock (CS) is a time-critical emergency with high mortality and morbidity, yet prognostication remains challenging with neither guidelines nor consensus on the comparative applicability of existing risk scores used in CS. We aimed to evaluate discrimination and calibration of existing risk scores in predicting mortality in CS. Methods We searched MEDLINE, Embase, and Scopus databases up to 1 January 2024 for articles developing, redeveloping or validating a multivariable model predicting short-term mortality in adults with CS. Area under curve (AUC) statistics and observed versus expected (O:E) ratios were used to assess discrimination and calibration respectively. We conducted random-effects inverse-variance weighted meta-analyses on 6 prevalent, well-validated prediction models (IABP-SHOCK II, CardShock, SAPS II, SOFA, APACHE II and SAVE) to derive pooled estimates of AUCs and aggregate O:Es for each model. Subgroup analyses were conducted to identify sources of heterogeneity. Results We included 92 studies (115 study cohorts, 75,228 patients) in our analysis, with 43 unique prediction models identified. Our meta-analysis revealed that the CS-specific CardShock score had the overall best discrimination (AUC 0.73; 95% CI 0.70-0.76) and calibration (O:E 1.06; 95% CI 0.79-1.41). SOFA performed second best in discrimination (AUC: 0.72, 95%-CI: 0.68 to 0.75) but tended to underpredict mortality (AUC: 0.85, 95%-CI: 0.65-1.12). IABP and APACHE had comparable discrimination (AUC 0.71), IABP tended to overpredict (O:E 1.24; 95% CI 0.995-1.55) while APACHE (O:E 0.93; 95% CI 0.57-1.53) tended to underpredict mortality. In subgroup analysis, CardShock performed best based on mechanical circulatory support type (AUC 0.74-0.76, O:E 0.82-0.99), location (AUC 0.67-0.87, O:E 1.00-1.25) and study type (AUC 0.73-0.97, O:E 0.91-1.99). Conclusion Among well-validated risk scores, the CardShock score performed best in discriminating and calibrating mortality. Its variable performance across subgroups underscores the need for more research to identify an ideal risk prediction model that considers emerging therapies and use in multiple scenarios.