Abstract Background Deep learning (DL) applied to electrocardiograms (ECG) is an emerging modality in the prediction of incident atrial fibrillation or atrial flutter (AF). The generalizability of such methods to a tertiary heart center (THC) population has not yet been formally explored. Purpose Evaluate the performance of DL in predicting 5-year incident AF using a resting 12-lead ECG in sinus rhythm acquired at a THC. Methods In a retrospective study, we examined 1.4 million ECGs acquired from over 250,000 adults at a THC between 2004 and 2022. ECGs were excluded if they were performed within 30 days of cardiac surgery, captured a rhythm other than sinus rhythm, or were acquired in patients with pre-existing AF. Incident AF at 5 years was modelled as a binary outcome and was determined on the basis of available outpatient and inpatient clinical databases and ECG diagnoses. Included ECGs were randomly split by distributing patients into training (70%), validation (10%), and test (20%) sets. A ResNet-50 model was trained using the training set. Hyperparameters were optimized using the validation set. The tuned model's performance is reported on the test set. The results at the patient level were derived by averaging the model's probability outputs for ECGs grouped according to both their AF outcome and the patient's identity. Bootstrapping was used to report confidence intervals (CI). Sensitivity and specificity are reported at a classification threshold based on the Matthews correlation coefficient. Saliency maps were used to enhance the model’s explainability. Results A total of 669,782 ECGs (47% of the screened ECGs) were included among 145,323 patients. Mean age was 63±15 years and 62% were male. The 5-year incident AF outcome was observed in 12% of ECGs and 16% of patients. The performance of the tuned model was first evaluated at the ECG level on the test set demonstrating an area under the receiver operating curve (AUC) of 0.75 (95% CI: 0.745-0.753) and an integrated calibration index of 0.009 (95% CI: 0.008-0.011). When testing the model at the patient level to simulate a deployment scenario, the AUC improved to 0.78 (95% CI: 0.768-0.783) with a sensitivity of 50% (95% CI: 49-52) and specificity of 87% (95% CI: 86.5-87.3). Stratified results by sex showed a better AUC in females, 0.81 (95% CI: 0.80-0.82), compared to males, 0.75 (95% CI: 0.74-0.76). Subgroup analyses revealed a trend of improved performance at the patient level with an increasing number of ECGs. Saliency maps highlighted the P-wave area as having the highest influence on the model’s prediction, thereby enhancing the model’s plausibility and explainability. Conclusion A unimodal ECG-based deep learning model showed promising 5-year incident AF prediction performance in a cohort of all-comer patients at a tertiary heart center. Further studies could explore the use of multimodal prediction models that integrate ECGs with other clinical and imaging data.