Abstract

Purpose: This study sought to review the characteristics, strengths, weaknesses variants, applications areas and data types applied on the various Dimension Reduction techniques. Methodology: The most commonly used databases employed to search for the papers were ScienceDirect, Scopus, Google Scholar, IEEE Xplore and Mendeley. An integrative review was used for the study where 341 papers were reviewed. Results: The linear techniques considered were Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), Singular Value Decomposition (SVD), Latent Semantic Analysis (LSA), Locality Preserving Projections (LPP), Independent Component Analysis (ICA) and Project Pursuit (PP). The non-linear techniques which were developed to work with applications that have complex non-linear structures considered were Kernel Principal Component Analysis (KPCA), Multi-dimensional Scaling (MDS), Isomap, Locally Linear Embedding (LLE), Self-Organizing Map (SOM), Latent Vector Quantization (LVQ), t-Stochastic neighbor embedding (t-SNE) and Uniform Manifold Approximation and Projection (UMAP). DR techniques can further be categorized into supervised, unsupervised and more recently semi-supervised learning methods. The supervised versions are the LDA and LVQ. All the other techniques are unsupervised. Supervised variants of PCA, LPP, KPCA and MDS have been developed. Supervised and semi-supervised variants of PP and t-SNE have also been developed and a semi supervised version of the LDA has been developed. Conclusion: The various application areas, strengths, weaknesses and variants of the DR techniques were explored. The different data types that have been applied on the various DR techniques were also explored.

Highlights

  • The world in recent times has seen huge amount of data being churned out in different areas of application, resulting in an exponential growth in the complexity, heterogeneity, dimensionality and the size of data [1]

  • The non-linear techniques which were developed to work with applications that have complex non-linear structures considered were Kernel Principal Component Analysis (KPCA), Multi-dimensional Scaling (MDS), Isomap, Locally Linear Embedding (LLE), Self-Organizing Map (SOM), Latent Vector Quantization (LVQ), t-Stochastic neighbor embedding (t-SNE) and Uniform Manifold Approximation and Projection (UMAP)

  • The area of Dimension reduction is becoming very relevant in different application areas such as healthcare, economics, environment, social science, agriculture, and many more because of the sheer amount of data being generated in the era of big data

Read more

Summary

Introduction

The world in recent times has seen huge amount of data being churned out in different areas of application, resulting in an exponential growth in the complexity, heterogeneity, dimensionality and the size of data [1]. Areas such as education, medicine, web, social media and business are inundated with huge amount of data in this era of Information Communication and Technology (ICT) [2]. The area of machine learning has evolved rapidly to help address this problem. There is a focus on computer programs to access data and use it to learn for themselves

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call