Atrial fibrillation (AF) is a disease of high heterogeneity, and the association between AF phenotypes and the outcome of different catheter ablation strategies remains unclear. Conventional classification of AF (e.g. according to duration, atrial size, and thromboembolism risk) fails to provide reference for the optimal stratification of the prognostic risks or to guide individualized treatment plan. In recent years, research on machine learning has found that cluster analysis, an unsupervised data-driven approach, can uncover the intrinsic structure of data and identify clusters of patients with pathophysiological similarity. It has been demonstrated that cluster analysis helps improve the characterization of AF phenotypes and provide valuable prognostic information. In our cohort of AF inpatients undergoing radiofrequency catheter ablation, we used unsupervised cluster analysis to identify patient subgroups, to compare them with previous studies, and to evaluate their association with different suitable ablation patterns and outcomes. The participants were AF patients undergoing radiofrequency catheter ablation at West China Hospital between October 2015 and December 2017. All participants were aged 18 years or older. They underwent radiofrequency catheter ablation during their hospitalization. They completed the follow-up process under explicit informed consent. Patients with AF of a reversible cause, severe mitral stenosis or prosthetic heart valve, congenital heart disease, new-onset acute coronary syndrome within three months prior to the surgery, or a life expectancy less than 12 months were excluded according to the exclusion criteria. The cohort consisted of 1102 participants with paroxysmal or persistent/long-standing persistent AF. Data on 59 variables representing demographics, AF type, comorbidities, therapeutic history, vital signs, electrocardiographic and echocardiographic findings, and laboratory findings were collected. Overall, data for the variables were rarely missing (<5%), and multiple imputation was used for correction of missing data. Follow-up surveys were conducted through outpatient clinic visits or by telephone. Patients were scheduled for follow-up with 12-lead resting electrocardiography and 24-hours Holter monitoring at 3 months and 6 months after the ablation procedure. Early ablation success was defined as the absence of documented AF, atrial flutter, or atrial tachycardia >30 seconds at 6-month follow-up. Hierarchical clustering was performed on the 59 baseline variables. All characteristic variables were standardized to have a mean of zero and a standard deviation of one. Initially, each patient was regarded as a separate cluster, and the distance between these clusters was calculated. Then, the Ward minimum variance method of clustering was used to merge the pair of clusters with the minimum total variance. This process continued until all patients formed one whole cluster. The "NbClust" package in R software, capable of calculating various statistical indices, including pseudo t2 index, cubic clustering criterion, silhouette index etc, was applied to determine the optimal number of clusters. The most frequently chosen number of clusters by these indices was selected. A heatmap was generated to illustrate the clinical features of clusters, while a tree diagram was used to depict the clustering process and the heterogeneity among clusters. Ablation strategies were compared within each cluster regarding ablation efficacy. Five statistically driven clusters were identified: 1) the younger age cluster (n=404), characterized by the lowest prevalence of cardiovascular and cerebrovascular comorbidities but the highest prevalence of obstructive sleep apnea syndrome (14.4%); 2) a cluster of elderly adults with chronic diseases (n=438), the largest cluster, showing relatively higher rates of hypertension, diabetes, stroke, and chronic obstructive pulmonary disease; 3) a cluster with high prevalence of sinus node dysfunction (n=160), with patients showing the highest prevalence of sick sinus syndrome and pacemaker implantation; 4) the heart failure cluster (n=80), with the highest prevalence of heart failure (58.8%) and persistent/long-standing persistent AF (73.7%); 5) prior coronary artery revascularization cluster (n=20), with patients of the most advanced age (median: 69.0 years old) and predominantly male patients, all of whom had prior myocardial infarction and coronary artery revascularization. Patients in cluster 2 achieved higher early ablation success with pulmonary veins isolation alone compared to extensive ablation strategies (79.6% vs. 66.5%; odds ratio [OR]=1.97, 95% confidence interval [CI]: 1.28-3.03). Although extensive ablation strategies had a slightly higher success rate in the heart failure group, the difference was not statistically significant. This study provided a unique classification of AF patients undergoing catheter ablation by cluster analysis. Age, chronic disease, sinus node dysfunction, heart failure and history of coronary artery revascularization contributed to the formation of the five clinically relevant subtypes. These subtypes showed differences in ablation success rates, highlighting the potential of cluster analysis in guiding individualized risk stratification and treatment decisions for AF patients.