Abstract

An important step in multivariate analysis is the dimensionality reduction, which allows for a better classification and easier visualization of the class structures in the data. Techniques like PCA, PLS-DA and LDA are most often used to explore the patterns in the data and to reduce the dimensions. Yet the data does not always reveal properly the structures wen these techniques are applied. To this end, a supervised projection pursuit (SuPP) is proposed in this article, based on Jensen-Shannon divergence. The combination of this metric with powerful Monte Carlo based optimization algorithm, yielded a versatile dimensionality reduction technique capable of working with highly dimensional data and missing observations. Combined with Naïve Bayes (NB) classifier, SuPP proved to be a powerful preprocessing tool for classification. Namely, on the Iris data set, the prediction accuracy of SuPP-NB is significantly higher than the prediction accuracy of PCA-NB, (p-value ≤ 4.02E-05 in a 2D latent space, p-value ≤ 3.00E-03 in a 3D latent space) and significantly higher than the prediction accuracy of PLS-DA (p-value ≤ 1.17E-05 in a 2D latent space and p-value ≤ 3.08E-03 in a 3D latent space). The significantly higher accuracy for this particular data set is a strong evidence of a better class separation in the latent spaces obtained with SuPP.

Highlights

  • Dimensionality reduction (DR) techniques, referred to as projection methods, are perhaps the most used exploratory tools for applications in various fields, from image analysis and information retrieval to bioinformatics and chemometrics

  • The 2D representation in Fig. 4 a,b and c, indicates that there is a better separation of the classes in the case of supervised projection pursuit (SuPP) (Fig. 4 a) while the distribution of the groups in the latent space is similar to those in the cases of principal component analysis (PCA) (Fig. 4 b) and partial least squares (PLS)-DA (Fig. 4 c)

  • The SuPP strategy described here is a versatile dimensionality reduction technique that offers a new perspective on supervised exploratory data analysis

Read more

Summary

Introduction

Dimensionality reduction (DR) techniques, referred to as projection methods, are perhaps the most used exploratory tools for applications in various fields, from image analysis and information retrieval to bioinformatics and chemometrics. The projection techniques can be classified into three major groups according to the way the latent components are obtained: supervised (i.e. considers class labels for the deduction of the latent components and for further classification), semisupervised (i.e. uses both labeled and unlabeled samples to infer class structures in the latent space) and unsupervised (i.e. class labels are not available and are yet to be found from the structural patterns of the projected data or the class labels are not used). Each of these three major types of DR methods can be further divided into “linear” and “nonlinear” methods. We ought to mention a few from each category that are more recent or more used across different domains

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call