Abstract Disclosure: K. Brewer: None. R. Sisk: None. M. Dapas: None. C. Li: None. A. Dunaif: Consulting Fee; Self; AcaciaBio, Inc, Neurocine Biosciences, Inc. Speaker; Self; Quest Diagnostics. Other; Self; Co-Editor Endocrine Today, Healio, Slack Inc. Clustering methods have been used to resolve heterogeneity within complex diseases to elucidate underlying biologic mechanisms. We (Dapas et al. PLoS Med, 2020) and others have applied these methods to PCOS and identified discrete subtypes, including those that capture distinct reproductive or metabolic features. We objectively compared two widely used methods, hierarchical clustering (HC), which recursively merges subjects based on their similarity, and k-means (Km), which iteratively groups individuals to minimize their distance to cluster centroids. The number of clusters (k) was predefined in both approaches. We compared HC vs Km in 874 European ancestry PCOS cases diagnosed by NIH criteria using 8 traits (BMI, testosterone, SHBG, DHEAS, LH, FSH, fasting insulin, fasting glucose) and 9 traits with the addition of AMH. The ConsensusClusterPlus R package was used to evaluate cluster stability of both approaches with k=2, 3, or 4, using 8 or 9 traits. Genomewide association study (GWAS) meta-analysis was performed as reported with a discovery cohort (620 cases, 2951 controls) and a replication cohort (371 cases, 926 controls), except that genotypes were imputed to the newer TOPMED (r2) panel. Clustering using 8 traits and k=3 resulted in the best cluster stability for both HC and Km (pairwise consensus proportion: HC 0.98, Km 0.93) compared to k=2 with 8 traits (HC 0.88, Km 0.88) or 9 traits (HC 0.95, Km 0.92). No stability was observed for k=4 with 8 or 9 traits using either method (range: 0.35-0.66). Km showed slightly better cluster separation than HC (average silhouette width: Km 0.12 vs HC 0.09) with k=3 and 8 traits. The biologic relevance of the three PCOS subtypes, which we have designated as reproductive (cases with higher LH, FSH, SHBG), metabolic (higher BMI, insulin, glucose), and background, was assessed by GWAS, an orthogonal (i.e., uncorrelated) approach for subtype confirmation. The subtypes identified with HC had genomewide significant associations (metabolic subtype, c9orf3/FANCC, rs10761370, minor allele frequency [MAF] 0.47, p=1.21 x 10-8; background subtype, FSHB/ARL14EP, rs10835649, MAF 0.17, p=8.49 x 10-[1]0). There were no genomewide significant signals when the subtypes were identified with Km despite having a better cluster separation. In summary, only HC clusters appeared to capture biologically meaningful differences since two of the three subtypes thus identified were associated with genomewide significant loci. Neither the addition of AMH nor increasing the number of groups improved clustering metrics. Our results emphasize the importance of independent validation of clustering approaches using an orthogonal confirmation strategy such as GWAS. We conclude that HC clustering using 8 traits is superior to Km for resolving the genetic heterogeneity of PCOS. Presentation: 6/2/2024
Read full abstract