Abstract Introduction Heart failure prognosis is an active research area, with the development of several new risk scores each year. While risk score derivation and evaluation have become common practices, these processes currently lack standardisation. Purpose We aimed to: i) develop a standards-based software tool to support prognostic modelling; and ii) apply the tool to derive, evaluate and compare exemplar heart failure risk models to address a hypothetical question of whether using only non-invasive measurements provides similar prognosis value compared to adding invasive measurements. Methods A software tool was developed based on a prognostic modelling workflow defined using the Common Workflow Language and containing 7 steps: univariate Cox proportional hazard (PH) regression, multivariable Cox PH regression with assessment of proportional hazards and linearity with outcome (Schoenfeld and Martingale residuals), derivation of points per risk factor, risk calculation and stratification, discrimination and calibration assessment. Using data from the PEOPLE heart failure cohort [1], two integer-based risk scores were developed to predict 2-year all-cause mortality. The first model utilised 12 variables available at recruitment informed by prior analysis of the PEOPLE cohort to provide a benchmark for comparison, while the second model is based on 9 non-invasive parameters. The analysis was conducted on cases with complete data (n=781), with no internal or external validation. Results By inputting the dataset and indicating the variables of interest for each model, our software tool automatically generated the patient-specific risk scores, risk strata, c-index, and discrimination and calibration plots. A 12-parameter model based on age, sex, ejection fraction, hypertension, diabetes, atrial fibrillation, ischaemic history, NYHA class, heart rate, systolic blood pressure, creatinine and NT-proBNP yielded reasonable discrimination (c-index=0.77). Graphical assessment of risk strata and calibration (Fig. 1.a and c) revealed good performance overall, apart from the highest risk decile where survival was overestimated. The reduced 9-parameter model excluded creatinine, NT-proBNP, and automatically dropped diabetes status for non-linearity. It showed a substantial drop in discrimination (c-index=0.67), confirmed in graphical assessment of risk strata (Fig. 1.b). Calibration was more variable per decile than seen with the 12-parameter model but was more robust to survival overestimation (Fig. 1.d). Conclusion Our software tool ensures the uniformity and comparability of the two risk models derived by a common statistical process. In this example, we showed how the addition of NT-proBNP, creatinine, and diabetes status improves discrimination for predicting heart failure survival. This tool can be leveraged to test new hypotheses in a standardised and reproducible manner to enhance research in heart failure prognosis.