Abstract

In data mining research and development, one of the defining challenges is to perform classification or clustering tasks for relatively limited-samples with high-dimensions data, also known as high-dimensional limited-sample size (HDLSS) problem. Due to the limited-sample-size, there is a lack of enough training data to train classification models. Also, the ‘curse of dimensionality’ aspect is often a restriction on the effectiveness of many methods for solving HDLSS problem. Classification model with limited-sample dataset lead to overfitting and cannot achieve a satisfactory result. Thus, the unsupervised method is a better choice to solve such problems. Due to the emergence of deep learning, their plenty of applications and promising outcome, it is required an extensive analysis of the deep learning technique on HDLSS dataset. This paper aims at evaluating the performance of variational autoencoder (VAE) based dimensionality reduction and unsupervised classification on the HDLSS dataset. The performance of VAE is compared with two existing techniques namely PCA and NMF on fourteen datasets in term of three evaluation metrics namely purity, Rand index, and NMI. The experimental result shows the superiority of VAE over the traditional methods on the HDLSS dataset.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.