Abstract
We propose in this paper a visualization approach for large online databases using the Hilbert space-filling curve to map N-dimensional data points to 2D or 3D points. Dimensionality reduction methods like principal component analysis (PCA), multi dimensional scaling (MDS) or self organizing maps (SOMS) can map N-dimensional data points with N≫3 into 3 dimensional or 2 dimensional values that allow us to visualize the data. These methods although popular, require either the calculation of a scatter matrix, eigenvalues and eigenvectors, or the iteration of learning algorithms. Therefore these methods cannot perform online, can be slow with large databases and always produce information loss when the data is mapped from the multidimensional space to the 2D or 3D image. Space-filling curves like the Peano, Z, and Hilbert curve, on the contrary, produce a 1-to-1 mapping between points in a line segment and an arbitrary N-Dimensional hypercube. This 1-to-1 mapping guarantees that there is no information loss on the transformation. Specifically the Hilbert space-filling curve is known to preserve the Lebesgue measure and has been proven to produce an optimal mapping in the sense that an arbitrary contiguous block of information will receive the minimum number of splits in the mapped space. The Hilbert space-filling curve has been extensively used for indexing and clustering by mapping N-dimensional data points to 1-dimensional values. We propose here to use the curve to map to 2 or 3 dimensions for purposes of visualization: By taking advantage of its 1-to-1 nature, a new and generic method to map N-dimensional data points to 2D or 3D points using the Hilbert space-filling curve is developed. We prove theoretically that the calculation of the mapping can be done in constant time if we fix the order of approximation, thereby giving linear O(n) performance on the number of data points to map. We create a Hilbert space-filling curve visualization tool that is much faster than the other methods mentioned and allows us to generate quickly for very large datasets various different visualizations of the data, thereby compensating the lack of use of statistical information in the calculation of the mapped points. We compare our approach to MDS and PCA with a benchmark data set and three real datasets using the distance preserving and topology preserving measure as benchmarks. Our experiments indicate that the Hilbert space-filling curve produces acceptable quality of mapping while achieving much faster visualization and is therefore especially useful for online visualization of very large data sets.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.