Abstract

The dimensionality reduction and visualization problems associated with multivariate centroids obtained by clustering algorithms are addressed in this paper. Two approaches are used in the literature for the solution of such problems, specifically, the self-organizing map (SOM) approach and mapping selected two features manually (MS2Fs). In addition, principle component analysis (PCA) was evaluated as a component for solving this problem on supervised datasets. Each of these traditional approaches has drawbacks: if SOM runs with a small map size, all centroids are located contiguously rather than at their original distances according to the high-dimensional structure; MS2Fs is not an efficient method because it does not take features outside of the method into account, and lastly, PCA is a supervised method and loses the most valuable feature. In this study, five novel hybrid approaches were proposed to eliminate these drawbacks by using the quantum genetic algorithm (QGA) method and four feature selection methods, Pearson’s correlation, gain ratio, information gain, and relief methods. Experimental results demonstrate that, for 14 datasets of different sizes, the prediction accuracy of the proposed weighted clustering approaches is higher than the traditional K-means++ clustering approach. Furthermore, the proposed approach combined with K-means++ and QGA shows the most efficient placements of the centroids on a two-dimensional map for all the test datasets.

Highlights

  • Human visual perception can be insufficient for the interpretation of a pattern within a multivariate structure, causing errors at the decisionmaking stage

  • Compatibility tests consider the difference between multivariate and 2D structures (DBM2). is section details the implementation of the tests on the proposed and traditional algorithms on 14 datasets of various sizes and comparison of the test results

  • Weighted K-Means++ and Mapping by Relief (WMR) 69 are weighted approaches based on four different feature selection methods (Pearson’s correlation, the gain ratio, information gain, and relief methods)

Read more

Summary

Introduction

Human visual perception can be insufficient for the interpretation of a pattern within a multivariate (or high dimensional) structure, causing errors at the decisionmaking stage. PCA is not typically successful at mapping the dataset in 2D, except in image representation and facial recognition studies [3, 4] Another difficulty is that even if a dimensional reduction is applied, the visualization of all instances in a large dataset causes storage and performance problems. New approaches are proposed to visualize the centroids of the clusters on a 2D map, preserving the original distances in the high-dimensional structure. DRV-PMC is defined in detail in Section 2; related works providing guidance on DRV-P-MC, including traditional algorithms, hybrid approaches, and the notion of dimensionality reduction, are submitted in Section 3; in Sections 4 and 5, the reorganized traditional approaches and the proposed algorithms are formulated and presented in detail; the experimental studies performed by using 14 datasets with different characteristics and their accuracy results are given in Section 6; and Section 7 presents conclusions about the proposed methods

Problem Definition
Related Works
Experimental Results and Discussions
Conclusions and Future Work
D: Dataset F: Set of the features in D f: Length of F I
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call