Abstract Introduction Pulmonary vein isolation is superior to antiarrhythmic drugs for rhythm control in patients with atrial fibrillation (AF). Nevertheless, recurrences within the first year are still significant. Several clinical risk scores were developed to predict AF recurrence; however, despite good results in development cohorts, the results in external validation are usually poor, limiting the general applicability of these scores. Purpose We aimed to develop machine learning (ML) models to predict AF recurrences within the first year after catheter ablation and compare their performance with conventional risk scores. Methods We used a retrospective dataset of a tertiary hospital center, including all consecutive patients submitted to the first AF ablation between 2017 and 2021. The dataset included 76 features: patient characteristics, medical history, baseline echocardiography, calcium score, procedure variables, and early recurrence (ER) during the 90-day blanking period. The outcome was defined as the occurrence of an electrocardiographically documented late recurrence (LR) of atrial tachycardia/AF > 30s between 3-12 months. Three different supervised ML models (penalized logistic regression, random forests, and XGBoost) were developed on the training cohort, and hyperparameters were tuned using 10-fold cross-validation. A testing dataset was held out to estimate the final performance (25%). The following risk scores were calculated: APPLE, CAAP-AF, and BASE-AF2 (the last includes ER). Areas under the curve (AUC) of receiver operating characteristic curves were compared using DeLong’s test. Results 679 patients were included: 62.7% males, median age of 59±16 years, and 78.2% with paroxysmal AF. The median time from diagnosis to ablation was 2±4 years. 74.2% underwent radiofrequency ablation and the remaining cryoablation. ER occurred in 12.5%, and most of these patients also experienced LR (68.7 vs 19.6% in those without ER, p<0.001). LR was observed in 25.6%. The XGBoost model showed the best performance with an AUC 0.774, 95% confidence interval (CI) 0.688–0.844) outperforming existing scores (picture 1): APPLE score AUC 0.607, 95% CI 0.562–0.653, p-value<0.001; CAAP-AF score AUC 0.622, 95% CI 0.576–0.668, p-value=0.002; BASE-AF2 AUC 0.628, 95% CI 0.581–0.675, p-value= 0.003. Variable importance analysis showed a significant drop in the performance of the models when ER was not considered, indicating its high importance in predicting LR (picture 2). Conclusions Recurrence during the blanking period was the most important predictor of LR in our population. The ML model was superior to conventional risk scores. ML models might be an essential tool to improve the prediction of outcomes and clinical decision-making for optimal follow-up.ROC curvesFeature Importance