Abstract

Clustering is an important task in knowledge discovery with the goal to identify structures of similar data points in a dataset. Here, the focus lies on methods that use a human-in-the-loop, i.e., incorporate user decisions into the clustering process through 2D and 3D displays of the structures in the data. Some of these interactive approaches fall into the category of visual analytics and emphasize the power of such displays to identify the structures interactively in various types of datasets or to verify the results of clustering algorithms. This work presents a new method called interactive projection-based clustering (IPBC). IPBC is an open-source and parameter-free method using a human-in-the-loop for an interactive 2.5D display and identification of structures in data based on the user’s choice of a dimensionality reduction method. The IPBC approach is systematically compared with accessible visual analytics methods for the display and identification of cluster structures using twelve clustering benchmark datasets and one additional natural dataset. Qualitative comparison of 2D, 2.5D and 3D displays of structures and empirical evaluation of the identified cluster structures show that IPBC outperforms comparable methods. Additionally, IPBC assists in identifying structures previously unknown to domain experts in an application.

Highlights

  • The term “visual analytics” was introduced as “the science of analytical reasoning facilitated by interactive visual interfaces” [1]

  • The topographic maps are presented in the form of a 3D display in comparison with the 3D display from Graphical clustering toolkit (gCLUTO) and the 2D display from Vista

  • The results showed that interactive projection-based clustering (IPBC) outperformed VISTA and gCLUTO in terms of the adjusted Rand index on the clustering benchmark set of twelve datasets and the SCADI dataset

Read more

Summary

Introduction

The term “visual analytics” was introduced as “the science of analytical reasoning facilitated by interactive visual interfaces” [1]. It was pointed out that, based on current practice, a more specific definition would be that visual analytics combines automated analysis techniques with interactive displays of structures in data for an effective understanding, reasoning, and decision making based on large and complex datasets [2]. In systems such as the visual cluster rendering system (VISTA) [3], the objective is to display the dataset in such a way that it would be easy. At the core of these problems lies the assumption that the distances in the scatterplot are directly proportional to the distances of the data points in a high-dimensional space

Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.