Abstract

As one of the tasks of unsupervised learning, data dimensionality reduction is faced with the problem of a lack of evaluation methods. Based on this, three classical dimensionality reduction methods such as PCA, t-SNE and UMAP were selected as the research object in this paper. This article selected 5 three-classification datasets and used the three methods mentioned above to perform dimensionality reduction. This paper plotted 3D scatter graphs after dimensionality reduction to analyze the differentiation effect of the data on different categories of the target variable. Then the data after dimensionality reduction was classified using random forest model and the classification accuracy was obtained. According to the 3D scatter plots and the accuracy of random forest, it is found that PCA has a good dimensionality reduction effect on most of the selected datasets, and t-SNE has a relatively stable dimensionality reduction effect. In contrast, UMAP has good dimensionality reduction performance in some individual datasets but lacks stability. Overall, this paper proposes a dimensionality reduction evaluation method that combines scatter-plot visualization results and classification models, which can effectively predict the performances of the dimensionality reduction methods for a variety of datasets, thereby promoting the comparison and selection of dimensionality reduction methods in the field of unsupervised learning.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call