Abstract Study question Is the clustering model more accurate than phenotype classification in predicting live birth in polycystic ovary syndrome (PCOS) women undergoing in vitro fertilization (IVF)? Summary answer Novel PCOS clusters, identified by patients’ endocrine and metabolic profiles, more accurately predict live birth rates in women undergoing IVF than the traditional phenotype classification What is known already PCOS is a heterogeneous condition that affects fertility and IVF success. It is classified based on three factors: hyperandrogenism (HA), ovulatory dysfunction (OD), and polycystic ovaries (PCOM). The ESHRE/ASRM classification system identifies four phenotypes: A (including all three factors), B (HA and OD), C (HA and PCOM), and D (OD and PCOM). Although this classification is widely used, its focus solely on clinical signs and symptoms, while neglecting underlying endocrine and metabolic factors, may limit its effectiveness in predicting fertility outcomes in PCOS patients. Therefore, classifying PCOS patients may require a more comprehensive and inclusive assessment approach. Study design, size, duration This prospective cohort study was conducted from June 2020 to August 2022 at a tertiary IVF center in Ho Chi Minh City, Vietnam. Clustering models were built based on the endocrine and metabolic profiles of 731 PCOS patients. Patients were grouped into novel clusters based on their endocrine and metabolic profiles and compared with traditional phenotypes classification. The primary outcome measured was the live birth rate. Participants/materials, setting, methods Patients aged 18–45, diagnosed with PCOS according to the Rotterdam 2003 criteria, and undergoing IVF treatment were included in the study. Patients with hypothyroidism, Cushing’s syndrome, congenital adrenal hyperplasia, or contraindications to hormonal treatment were excluded. We employed Gaussian Mixture Models for clustering, using endocrine and metabolic data as the key features. We then compared the accuracy of the clustering model with phenotype classification in predicting IVF outcomes using a logistic regression model. Main results and the role of chance We identified three distinct clusters among polycystic ovary syndrome (PCOS) patients. Cluster 0 (n = 298) was characterized by significant insulin resistance and dyslipidemia. Cluster 1 (n = 219) displayed the lean PCOS phenotype, and Cluster 2 (n = 214) had moderate insulin resistance with severe hyperandrogenism. Our findings indicated that Clusters 1 and 2 had a higher cumulative live birth rate (CLBR) (39.7% and 37.9% versus 28.5% in Cluster 0, p = 0.015) and lower gestational diabetes rates (8.68% and 6.07%, respectively) compared to Cluster 0 (12.4%, p = 0.048). Additionally, the prevalence of hypertensive disorders during pregnancy showed differences across clusters (3.69% in Cluster 1, 0.46% in Cluster 2, and 2.8% in Cluster 0, p = 0.059). In contrast, the traditional phenotype classification did not demonstrate significant differences in CLBR, gestational diabetes, and hypertensive disorders. The cluster-based method provided a more accurate prediction of CLBR using logistic regression, with an AUC of 0.67 (95% CI: 0.57-0.76) compared to 0.49 (95% CI: 0.41-0.59) for traditional classifications (p = 0.001) Limitations, reasons for caution This is a single-center study conducted on Vietnamese patients. Therefore, the generalizability may be limited. The novel clustering’s predictive value for IVF outcomes in PCOS patients requires further validation in prospective studies. Wider implications of the findings This study suggests a potential shift towards personalized IVF treatment strategies for PCOS, using machine learning-based PCOS clustering identified by patients’ endocrine and metabolic profiles. Trial registration number NCT04364087