Treemap-Based Cluster Visualization and its Application to Text Data Analysis

Yasufumi Takama,Hiroki Shibata,Yuna Tanaka,Yoshiyuki Mori

doi:10.20965/jaciii.2021.p0498

Yasufumi Takama, Hiroki Shibata + Show 2 more

Open Access

https://doi.org/10.20965/jaciii.2021.p0498

Copy DOI

Abstract

This paper proposes Treemap-based visualization for supporting cluster analysis of multi-dimensional data. It is important to grasp data distribution in a target dataset for such tasks as machine learning and cluster analysis. When dealing with multi-dimensional data such as statistical data and document datasets, dimensionality reduction algorithms are usually applied to project original data to lower-dimensional space. However, dimensionality reduction tends to lose the characteristics of data in the original space. In particular, the border between different data groups could not be represented correctly in lower-dimensional space. To overcome this problem, the proposed visualization method applies Fuzzy c-Means to target data and visualizes the result on the basis of the highest and the second-highest membership values with Treemap. Visualizing the information about not only the closest clusters but also the second closest ones is expected to be useful for identifying objects around the border between different clusters, as well as for understanding the relationship between different clusters. A prototype interface is implemented, of which the effectiveness is investigated with a user experiment on a news articles dataset. As another kind of text data, a case study of applying it to a word embedding space is also shown.

Full Text