A comparative user study of visualization techniques for cluster analysis of multidimensional data sets

Elio Ventocilla,Maria Riveiro

doi:10.1177/1473871620922166

Elio Ventocilla, Maria Riveiro

Open Access

PDF Available

https://doi.org/10.1177/1473871620922166

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

This article presents an empirical user study that compares eight multidimensional projection techniques for supporting the estimation of the number of clusters, [Formula: see text], embedded in six multidimensional data sets. The selection of the techniques was based on their intended design, or use, for visually encoding data structures, that is, neighborhood relations between data points or groups of data points in a data set. Concretely, we study: the difference between the estimates of [Formula: see text] as given by participants when using different multidimensional projections; the accuracy of user estimations with respect to the number of labels in the data sets; the perceived usability of each multidimensional projection; whether user estimates disagree with [Formula: see text] values given by a set of cluster quality measures; and whether there is a difference between experienced and novice users in terms of estimates and perceived usability. The results show that: dendrograms (from Ward’s hierarchical clustering) are likely to lead to estimates of [Formula: see text] that are different from those given with other multidimensional projections, while Star Coordinates and Radial Visualizations are likely to lead to similar estimates; t-Stochastic Neighbor Embedding is likely to lead to estimates which are closer to the number of labels in a data set; cluster quality measures are likely to produce estimates which are different from those given by users using Ward and t-Stochastic Neighbor Embedding; U-Matrices and reachability plots will likely have a low perceived usability; and there is no statistically significant difference between the answers of experienced and novice users. Moreover, as data dimensionality increases, cluster quality measures are likely to produce estimates which are different from those perceived by users using any of the assessed multidimensional projections. It is also apparent that the inherent complexity of a data set, as well as the capability of each visual technique to disclose such complexity, has an influence on the perceived usability.

Highlights

Visualizing the structure of a data set can be seen as an initial step toward gaining an understanding of the problem space represented by the data itself
We investigate the effects the aforementioned multidimensional projections (MDPs) have on user-driven estimations of k, their perceived usability for the task of estimating k, and whether they lead to an implicit agreement with the estimates given by NbClust
The results presented show that the local methods analyzed, Laplacian Eigenmaps (LE) and LLE are more likely to be influenced by small changes in both data and parameter variations, and they tend to provide cluttered visualizations, whereas data points in t-Stochastic Neighbor Embedding (SNE), Isomap, and Principal Component Analysis (PCA) are more scattered. t-SNE, due to the nature of its gradient, tends to form small clusters

Summary

Introduction

Visualizing the structure of a data set can be seen as an initial step toward gaining an understanding of the problem space represented by the data itself. Scatter plots and scatter plot matrices are common examples for visually encoding data sets with dimensionalities between two and twelve.[3] For higher dimensional data sets, MDPs may rely on two types of unsupervised, machine learning (ML) techniques: DR and clustering. Both take a multidimensional data as input, and may produce an output which can later be plotted using a visual encoder (VE, used for visual encoding), for example, scatter plots or dendrograms. We argue that such would be the case of t-SNE, since its behavior as a general DR technique is uncertain,[17] and which is why it was presented by its authors as a visualization technique

Objectives

Results

Discussion

Conclusion

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Information Visualization	Publication Date: Jul 4, 2020
Citations: 16	License type: CC BY 4.0

R Discovery Prime

A comparative user study of visualization techniques for cluster analysis of multidimensional data sets

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Information Visualization

Lead the way for us

Similar Papers

Clustering Evaluation in High-Dimensional Data
Nenad Tomašev ... Miloš Radovanović
-
Nenad Tomašev, et. al.Nenad Tomašev ... Miloš Radovanović
01 Jan 2015
01 Jan 2015

Barrettʼs Esophagus Lesion Identification With Volumetric Laser Endomicroscopy: Interobserver Agreement Between Expert and Novice Users
Amrit K Kamboj ... Herbert C Wolfsen
American Journal of Gastroenterology | VOL. 112
Amrit K Kamboj, et. al.Amrit K Kamboj ... Herbert C Wolfsen
01 Oct 2017
American Journal of Gastroenterology | VOL. 112

Usability and Gaming Experience Assessment of the Nintendo Switch User Interface by Filipino Users
Cyrus Alexander R Ting ... Benette P Custodio
-
Cyrus Alexander R Ting, et. al.Cyrus Alexander R Ting ... Benette P Custodio
01 Jan 2020
01 Jan 2020

The Art of Extracting One-Dimensional Flow Properties from Multi-Dimensional Data Sets
Robert Baurle ... Richard Gaffney
-
Robert Baurle, et. al.Robert Baurle ... Richard Gaffney
08 Jan 2007
08 Jan 2007

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

A comparative user study of visualization techniques for cluster analysis of multidimensional data sets

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Information Visualization