MCLEAN: Multilevel Clustering Exploration As Network.

Daniel Alcaide,Jan Aerts

doi:10.7717/peerj-cs.145

Abstract

Finding useful patterns in datasets has attracted considerable interest in the field of visual analytics. One of the most common tasks is the identification and representation of clusters. However, this is non-trivial in heterogeneous datasets since the data needs to be analyzed from different perspectives. Indeed, highly variable patterns may mask underlying trends in the dataset. Dendrograms are graphical representations resulting from agglomerative hierarchical clustering and provide a framework for viewing the clustering at different levels of detail. However, dendrograms become cluttered when the dataset gets large, and the single cut of the dendrogram to demarcate different clusters can be insufficient in heterogeneous datasets. In this work, we propose a visual analytics methodology called MCLEAN that offers a general approach for guiding the user through the exploration and detection of clusters. Powered by a graph-based transformation of the relational data, it supports a scalable environment for representation of heterogeneous datasets by changing the spatialization. We thereby combine multilevel representations of the clustered dataset with community finding algorithms. Our approach entails displaying the results of the heuristics to users, providing a setting from which to start the exploration and data analysis. To evaluate our proposed approach, we conduct a qualitative user study, where participants are asked to explore a heterogeneous dataset, comparing the results obtained by MCLEAN with the dendrogram. These qualitative results reveal that MCLEAN is an effective way of aiding users in the detection of clusters in heterogeneous datasets. The proposed methodology is implemented in an R package available at https://bitbucket.org/vda-lab/mclean.

Highlights

Determining the number of clusters in a dataset is a frequent problem in data clustering, and is a distinct matter from the algorithm of solving the clustering problem
We suggest a novel and generic clustering and exploration approach called MCLEAN (Multilevel Clustering Exploration As Network) for grouping and visualizing multiple granularities of the data that enables: (1) exploration of the dataset using a overviewplus-detail representation, (2) simplification of the dataset using aggregation based on the similarity of data elements, (3) detection of substructures by means of community detection algorithms, and (4) inclusion of the human in the process of selection the number of clusters
The evaluation exercise was split into three parts: detection of patterns using the barcodetree, selection of thresholds comparing the dendrogram and barcode-tree, and detection of patterns combining the network representation and barcode-tree

Summary

Introduction

Determining the number of clusters in a dataset is a frequent problem in data clustering, and is a distinct matter from the algorithm of solving the clustering problem. The correct choice of the number of groups is often ambiguous depending on the shape and scale of the points in a dataset and the desired clustering resolution by the user. The proposed framework allows the user to employ tacit knowledge in the clustering process in order to detect substructures. This process provides a multilevel environment through overview-plus-detail offering both a general outlook of the data grouping and the precise union of a subset of elements using graphs. The importance of visual interaction for performing clustering analysis is increasingly recognized (Nielsen et al, 2012), as the expert users are capable of steering the analysis to produce more meaningful results. Including a human in the loop for taking decisions and for guiding the analysis is essential (Vogogias et al, 2016)

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PeerJ. Computer science	Publication Date: Jan 29, 2018
Citations: 4	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

MCLEAN: Multilevel Clustering Exploration As Network.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PeerJ. Computer science

Lead the way for us

Similar Papers

Guest editorial: Special issue on visualization and visual analytics
Aidong Lu ... Alark Joshi
Tsinghua Science and Technology | VOL. 18
Aidong Lu, et. al. Aidong Lu ... Alark Joshi
01 Apr 2013
Tsinghua Science and Technology | VOL. 18

Exploration and Assessment of Interaction in an Immersive Analytics Module: A Software-Based Comparison
Sofia Karam ... Vidanelage L Dayarathna
Applied Sciences | VOL. 12
Sofia Karam, et. al.Sofia Karam ... Vidanelage L Dayarathna
10 Apr 2022
Applied Sciences | VOL. 12

From the extraction of continuous features in parallel texts to visual analytics of heterogeneous areal-typological datasets
Thomas Mayer ... Michael Hund
-
Thomas Mayer, et. al.Thomas Mayer ... Michael Hund
01 Jan 2014
01 Jan 2014

Multilevel weighted enhancement for underwater image dehazing.
Kuldeep Purohit ... A N Rajagopalan
Journal of the Optical Society of America A | VOL. 36
Kuldeep Purohit, et. al.Kuldeep Purohit ... A N Rajagopalan
31 May 2019
Journal of the Optical Society of America A | VOL. 36

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MCLEAN: Multilevel Clustering Exploration As Network.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PeerJ. Computer science