Abstract
AbstractHierarchical and k‐medoids clustering are deterministic clustering algorithms defined on pairwise distances. We use these same pairwise distances in a novel stochastic clustering procedure based on a probability distribution. We call our proposed method CaviarPD, a portmanteau from cluster analysis via random partition distributions. CaviarPD first samples clusterings from a distribution on partitions and then finds the best cluster estimate based on these samples using algorithms to minimize an expected loss. Using eight case studies, we show that our approach produces results as close to the truth as hierarchical and k‐medoids methods, and has the additional advantage of allowing for a probabilistic framework to assess clustering uncertainty. The method provides an intuitive graphical representation of clustering uncertainty through pairwise probabilities from partition samples. A software implementation of the method is available in the CaviarPD package for R.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Statistical Analysis and Data Mining: The ASA Data Science Journal
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.