Abstract

Dimension reduction and visualization of high-dimensional data have become very important research topics because of the rapid growth of large databases with high dimensions in data science. A successful dimension reduction and visualization method seeks to produce a low-dimensional representation of high-dimensional data that preserves both the global and local structure of the data. In this paper, we propose using a generalized sigmoid function to model the distance similarity in both high- and low-dimensional space. In particular, a single parameter v is introduced to the generalized sigmoid function in low-dimensional space, so that we can adjust the slope and the heaviness of the function tail by changing the value of the parameter easily. Using real-world data sets with different sample sizes and dimensions, we show that our proposed method can generate visualization results that are competitive with those of the state-of-the-art methods, such as uniform manifold approximation and projection (UMAP), t-distributed stochastic neighbor embedding (t-SNE), and related methods. In addition, by adjusting the value of v, our proposed method can preserve more of both the global and finer cluster structure of the data. Furthermore, like UMAP, our proposed method can easily scale to massive high-dimensional data. Finally, we use domain knowledge to demonstrate that the finer subclusters that are revealed with small values of v are meaningful.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call