Abstract

The dimensionality of a dataset is the sum of all its variables, traits, and features. The objective is to minimize the number of columns used to depict this axis. Machine learning (ML) is a data-driven technique in which computers learn without human intervention by analyzing large amounts of historical data. Incredible uses of ML may be found in a variety of fields. Supervised learning, unsupervised learning, and reinforcement learning are the three primary learning challenges in ML. The authors of this work provide a thorough comparison of several approaches to Dimensionality Reduction. Training a model using an unlabeled dataset is known as unsupervised learning. The model picks up knowledge independently by studying the characteristics of the training dataset. The model then uses these learned characteristics to draw inferences about the test data. Clustering, k- means, agglomerative, principal component analysis, and Fuzzy C-means are just few of the many unsupervised learning methodologies and algorithms available. When using support vector machines with a radial basis function kernel for disease diagnosis based on gene or protein expression data, we present empirical comparison results comparing PCA to other dimension reduction (DR) methods (i.e. K-means Clustering Algorithm, Principal Component Analysis (PCA), Agglomerative, Apriori Algorithm, Fuzzy C-means). Among all this algorithms PCA gives the best results for the gene and protein dataset in terms of accuracy, cross validation rates and computing time. Our research shows that principle component analysis (PCA) not only outperforms other DR methods in terms of classification accuracy, but is also significantly more efficient since it does not need the optimization of a tuning parameter. The results lead us to believe that principal component analysis (PCA) might be a useful DR method for developing a trustworthy and unbiased classifier for gene or protein expression data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call