Abstract
The popularity of the cluster analysis in the tourism field has massively grown in the last decades. However, accordingly to our review, researchers are often not aware of the characteristics and limitations of the clustering algorithms adopted. An important gap in the literature emerged from our review regards the adoption of an adequate clustering algorithm for mixed data. The main purpose of this article is to overcome this gap describing, both theoretically and empirically, a suitable clustering algorithm for mixed data. Furthermore, this article contributes to the literature presenting a method to include the “Don’t know” answers in the cluster analysis. Concluding, the main issues related to cluster analysis are highlighted offering some suggestions and recommendations for future analysis.
Highlights
Cluster analysis is an exploratory description of a multidimensional dataset that aims to identify homogeneous groups of units, as similar as possible within groups and as different as possible among groups (Hennig et al 2016)
A review of the clustering algorithms adopted in travel and tourism articles published in the four leading International Tourism Research Journals, in the last 5 years, has been conducted
It has emerged that only few studies have been conducted performing a cluster analysis using mixed data as segmentation variables and, in all these studies, the clustering algorithm adopted was not appropriate
Summary
Cluster analysis is an exploratory description of a multidimensional dataset that aims to identify homogeneous groups of units, as similar as possible within groups and as different as possible among groups (Hennig et al 2016). If we believe that the observed units should belong to all clusters simultaneously rather than to be constraint to a sole cluster, we should adopt an overlapping (fuzzy) clustering algorithm instead of a nonoverlapping (crisp) clustering algorithm. To give another example, if categorical variables are used as segmentation variables, the Euclidean distance is not the best way to define distances between each pair of units. Researchers should use a suitable distance or dissimilarity measure for categorical variables, such as the Jaccard similarity index or the Simple Matching coefficient (D’Urso and Massari 2019)
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have