Abstract

The aim of dimensionality reduction is to obtain the faithful low-dimensional representations of high-dimensional data by preserving the data quality. It is beneficial to better visualize the high-dimensional data and improve the classification or clustering performance. Many dimensionality reduction methods based on the framework of stochastic neighbor embedding have been developed. However, most of them use the Euclidean distance to describe the dissimilarity of data points in high-dimensional space, which is not suitable for high-dimensional data with non-linear manifold structure. In addition, they usually use the family of normal distributions as their embedding distributions in low-dimensional space. This will incur that they are only suitable to deal with the spherical data. In order to deal with these issues, we present a novel dimensionality reduction method by integrating the Wasserstein distance and t-copula function into the stochastic neighbor embedding model. We first employ the Gaussian distribution equipped with the Wasserstein distance to describe the pairwise similarity in the high-dimensional space. Then, the t-copula function is used to generate a general heavy-tailed distribution for the description of low-dimensional pairwise similarity, which can process different shapes of data and avoid the crowding problem. Furthermore, Kullback–Leibler divergence is employed to measure the difference between the high-dimensional and low-dimensional similarities. Finally, a gradient descent algorithm with adaptive moment estimation is developed to solve the proposed objective function. Extensive experiments are conducted on eight real-world datasets to demonstrate the effectiveness of the proposed method in terms of the dimensional reduction quality, classification and clustering evaluation metrics.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call