Abstract Background Numerous risk scores, both uni- and multi-variable, have been developed to conduct prognostication for a range of clinical outcomes in heart failure (HF) patients [1,2]. However, these models provide only modest discrimination and often depend on complex test data (e.g., ejection fraction) collected outside of routine clinical care [2]. Furthermore, transparency in reporting of key individual-level metrics (e.g., precision, recall) is notably lacking. Purpose We developed and validated a novel deep learning model, the second version of the Transformer-based Risk assessment survival model (TRisk2), for 36-month prediction of all-cause mortality, cardiovascular-related mortality, fatal and non-fatal cardiovascular events, and renal outcomes in patients with HF using routine, linked UK electronic health records (EHR) data. Methods An open cohort of 405 thousand patients with HF between 40 and 90 years of age was identified using linked EHR from 1,063 and 355 English general practices, which were used for TRisk2 model development and external validation, respectively. Comparison was conducted against MAGGIC [2], modified for use on routine linked EHR. Additional analyses compared discriminatory performance in other age groups, by sex, and by baseline disease. All analyses were repeated for prediction of cardiovascular-related mortality, fatal and non-fatal cardiovascular events, and renal outcomes. Results TRisk2 demonstrated superior discrimination with Concordance index (C-index) of 0.828; 95% confidence interval (CI): 0.824 to 0.832 for all-cause mortality prediction outperforming MAGGIC. The proposed model’s performance was found to be stable across stratified analyses by sex, age, and baseline disease status. Both models were overall well-calibrated, and TRisk2 demonstrated greater net benefit than MAGGIC across reasonable decision boundaries in decision curve analyses. At exemplar risk threshold of 50% for 36-month prediction, TRisk2 outperformed MAGGIC by 0.08 for both, precision and recall and identified 10% more events. At the same threshold for 12-month prediction, TRisk2 outperformed MAGGIC by 0.14 and 0.26 precision and recall respectively and captured 70% more events. The well-calibrated TRisk2 similarly outperformed MAGGIC in prediction of cardiovascular-related mortality, fatal and non-fatal cardiovascular events, and renal outcomes. Conclusion Utilising solely routine EHR from primary and secondary care, TRisk2 enabled more accurate prediction than MAGGIC with 10-20% gain in C-index across the four outcome investigations. Integration of TRisk2 into routine clinical care could simultaneously eliminate the reliance of complex tests and improve prognostication of key clinical outcomes, thereby refining management of HF.Models' discrimination on four outcomes