Abstract Background Heart failure (HF) risk prediction models combine multivariable patient data to estimate an individual's risk of developing HF. By detecting at-risk and early-stage patients, models may facilitate earlier intervention to prevent or delay HF development. Previous systematic reviews were unable to recommend any existing prediction models for clinical use due to insufficient evidence and lack of guidelines on appraising study quality at their time of publication. Purpose To summarize the performance of risk prediction models for incident HF and identify models for further validation and potential clinical use. Methods We searched MEDLINE and EMBASE in June 2021 for English-language studies developing or validating HF risk prediction models. Studies were also retrieved from two previous systematic reviews. We narratively summarized model characteristics (e.g. model type, predictors used, prediction horizon) and study methodology (e.g. validation methods). Performance was assessed among all models validated in ≥ 1 cohort. For all models validated in ≥ 2 cohorts, we pooled discrimination measures using random-effects meta-analyses. Calibration was descriptively summarized based on individual study results from statistical tests and graph digitization of calibration plots. Study quality was assessed using the Prediction model Risk Of Bias ASsessment Tool (PROBAST). Results Of 18,937 publications screened, 41 studies consisting of 120 prediction models were included. Twenty models were both derived and validated, 99 only derived, and 1 only validated. Risk of bias was rated as high in nearly all (94.7%) PROBAST assessments, mostly attributable to issues with analysis. Among 21 models validated in ≥ 1 cohort, most had moderate (61.9%, C-statistic 0.7 to <0.8) or high (23.8%, C-statistic 0.8 to <0.9) discrimination. In patients with low predicted risk (<10%), the calibration was adequate. Nine (42.9%) of these 21 models were presented as web-based calculators and five (23.8%) as points-based risk scores. Based on performance, number of validation cohorts, study risk of bias, and user friendliness, the Atherosclerosis Risk in Communities (ARIC), Multi-Ethnic Study of Atherosclerosis (MESA), Pooled Cohort equations to Prevent Heart Failure (PCP-HF), and Health ABC models emerged as the most promising risk scores for clinical practice. Conclusions Given their acceptable performance but high risk of bias, future studies should focus on the external validation of these models in studies of high methodological rigor. Models should be validated in a greater diversity of patient populations, particularly with respect to race. Impact analyses assessing how the clinical implementation of these models affects patient outcomes are also required prior to their routine use. Once validated, these models may help guide clinical decision making to prevent the onset of HF with early, aggressive risk factor modification.Discrimination in 21 validated modelsCharacteristics of recommended models