Abstract

It is reported in this paper, the results of a study of the partitioning around medoids (PAM) clustering algorithm applied to four datasets, both standardized and not, and of varying sizes and numbers of clusters. The angular distance proximity measure in addition to the two more traditional proximity measures, namely the Euclidean distance and Manhattan distance, was used to compute object-object similarity. The data used in the study comprise three widely available datasets, and one that was constructed from publicly available climate data. Results replicate some of the well known facts about the PAM algorithm, namely that the quality of the clusters generated tend to be much better for small datasets, that the silhouette value is a good, even if not perfect, guide for the optimal number of clusters to generate, and that human intervention is required to interpret generated clusters. Additionally, results also indicate that the angular distance measure, which traditionally has not been widely used in clustering, outperforms both the Euclidean and Manhattan distance metrics in certain situations.Keywords: PAM, Euclidean, Manhattan, Angular distance, Silhouette

Highlights

  • IntroductionCluster analysis (or clustering) is an unsupervised machine learning task used to find structure in unlabelled data

  • Cluster analysis is an unsupervised machine learning task used to find structure in unlabelled data

  • Interpretation of generated clusters often requires human intervention to explain patterns that are common to members of the clusters

Read more

Summary

Introduction

Cluster analysis (or clustering) is an unsupervised machine learning task used to find structure in unlabelled data. The clustering task groups a set of objects in such a way that objects in the same group (called a cluster) are more similar to each other than to those in other clusters (Aldenderfer and Blashfield, 1984; Han et al, 2006). Several clustering approaches have been developed to address different types of data. These include: partitioning approaches, hierarchical approaches, density-based methods, grid-based methods, model-based methods, special techniques for clustering high-dimensional data, and constraint-based clustering (Han et al, 2006; Yinghua et al, 2016).

Objectives
Methods
Results
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.