Abstract
The connections in a graph generate a structure that is independent of a coordinate system. This visual metaphor allows creating a more flexible representation of data than a two-dimensional scatterplot. In this article, we present STAD (Simplified Topological Abstraction of Data), a parameter-free dimensionality reduction method that projects high-dimensional data into a graph. STAD generates an abstract representation of high-dimensional data by giving each data point a location in a graph which preserves the approximate distances in the original high-dimensional space. The STAD graph is built upon the Minimum Spanning Tree (MST) to which new edges are added until the correlation between the distances from the graph and the original dataset is maximized. Additionally, STAD supports the inclusion of additional functions to focus the exploration and allow the analysis of data from new perspectives, emphasizing traits in data which otherwise would remain hidden. We demonstrate the effectiveness of our method by applying it to two real-world datasets: traffic density in Barcelona and temporal measurements of air quality in Castile and León in Spain.
Highlights
DATA visualization is extensively used to reveal patterns and structures in data
The main novelty of STAD lies in the way distances are represented as a network, providing a two-dimensional mapping of data and connections between nodes which reinforce the communicated information in the plot
With the qualitative assessment we collect advantages and disadvantages of STAD networks over other dimensionality reduction methods based on scatterplots
Summary
DATA visualization is extensively used to reveal patterns and structures in data. The display of high-dimensional datasets concerning point clouds with a high number of attributes continues to be a relevant research field due to the wide range of applications. A visualization to analyze the evolution of a highdimensional time series requires a different approach than projecting a document corpus While both aim to represent the data in a limited number of dimensions, the first emphasizes the progressive and continuous changes that occur in time and the second aims to find differences between groups of documents. Dimensionality reduction techniques allow for embedding high-dimensional data into a plot with two or three axes. These solutions provide a visual scalability advantage over classical scatterplot matrices and parallel coordinates [1]. The most recent methods such t-SNE [2] or UMAP [3] are effective in identifying similar elements and projecting them separated from other groups.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE transactions on visualization and computer graphics
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.