Abstract

Classification methods are fundamental techniques designed to find mathematical models that are able to recognize the membership of each object to its proper class on the basis of a set of measurements. The issue of classifying objects into groups when variables in an experiment are large will cause the misclassification problems. This study explores the approaches for tackling the classification problem of a large number of independent variables using parametric method namely PLS-DA and PCA+LDA. Data are generated using data simulator; Azure Machine Learning (AML) studio through custom R module. The performance analysis of the PLS-DA was conducted and compared with PCA+LDA model using different number of variables (p) and different sample sizes (n). The performance of PLS-DA and PCA+LDA has been evaluated based on minimum misclassification rate. The results demonstrated that PLS-DA performed better than the PCA+LDA for large sample size. PLS-DA can be considered to have a good and reliable technique to be used when dealing with large datasets for classification task.

Highlights

  • This study focuses on parametric methods only which are Linear Discriminant Analysis (LDA) and Partial Least Square Discriminant Analysis (PLS-DA)

  • The results indicate small misclassification rate under PLS-DA compared to Principal component analysis (PCA)+LDA

  • As the sample size gets larger, the misclassification rate becomes smaller for the PLS-DA

Read more

Summary

Introduction

Classification method plays a role as a classifier and acts as a predictive and descriptive model as well as discriminative variable selection. The purpose of classification is to achieve a minimum classification rate. Classification methods can be grouped into three; parametric, non-parametric and semi-parametric methods. According to [14] parametric methods are more reliable than non-parametric method as all the data must be normally distributed and exhibit a bell-shaped curve. Examples of parametric method used for classification

Objectives
Methods
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call