Abstract

Dimensionality Reduction (DR) is a crucial tool to facilitate high-dimensional data analysis. As the volume and the variety of features used to describe a phenomenon keeps increasing, DR has become not only desirable but paramount. However, DR can result in unreliable depictions of data. The uncertainties involved in DR may stem from the selection of methods, parameter configurations, and the constraints imposed by the user. To address these uncertainties, various means of DR quality assessment have been proposed in the literature. Nevertheless, how to optimize the trade-off between the quantification efficiency and accuracy is yet to be further studied. The purpose of this paper is to present a general technique, in the context of visual analytics, to support efficient uncertainty-aware high-dimensional data exploration. We model the uncertainty based on how well neighborhood geometries are preserved during DR. We employ approximated nearest neighbor (ANN) search algorithms to speed up the quantification process with marginal decrease in accuracy. We then visualize the quantified uncertainties in the form of augmented scatter plot. We test our technique with three real world datasets against several well-known DR techniques, and discuss possible underlying causes that lead to certain embedding patterns. Our results show that our approach is effective and beneficial for both DR assessment and user-centered data exploration.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call