Abstract

The strategy surrounding the extraction of a number of mixed variables is examined in this paper in building a model for Linear Discriminant Analysis (LDA). Two methods for extracting crucial variables from a dataset with categorical and continuous variables were employed, namely, multiple correspondence analysis (MCA) and principal component analysis (PCA). However, in this case, direct use of either MCA or PCA on mixed variables is impossible due to restrictions on the structure of data that each method could handle. Therefore, this paper executes some adjustments including a strategy for managing mixed variables so that those mixed variables are equivalent in values. With this, both MCA and PCA can be performed on mixed variables simultaneously. The variables following this strategy of extraction were then utilised in the construction of the LDA model before applying them to classify objects going forward. The suggested models, using three real sets of medical data were then tested, where the results indicated that using a combination of the two methods of MCA and PCA for extraction and LDA could reduce the model’s size, having a positive effect on classifying and better performance of the model since it leads towards minimising the leave-one-out error rate. Accordingly, the models proposed in this paper, including the strategy that was adapted was successful in presenting good results over the full LDA model. Regarding the indicators that were used to extract and to retain the variables in the model, cumulative variance explained (CVE), eigenvalue, and a non-significant shift in the CVE (constant change), could be considered a useful reference or guideline for practitioners experiencing similar issues in future.

Highlights

  • Linear discriminant analysis (LDA) is frequently favoured in classification problems when explanatory variables have multivariate normal distribution, and the populations share an identical or uniform covariance matrix (Nazman & Erbas, 2017)

  • This study investigated the capability of the suggested strategy in managing a number of mixed variables for the purpose of overcoming classification problems

  • The use and application of variable extraction methods (i.e. principal component analysis (PCA) and multiple correspondence analysis (MCA)) have been demonstrated in solving problems associated with classification tasks

Read more

Summary

Introduction

Linear discriminant analysis (LDA) is frequently favoured in classification problems when explanatory variables have multivariate normal distribution, and the populations share an identical or uniform covariance matrix (Nazman & Erbas, 2017). The issue of managing such a condition could be overcome by altering the mathematical functions present in the LDA, like easing or loosening its reliance on certain calculations when computing the inverse covariance matrix (Tarr et al, 2016). Despite this fact, limited research has addressed this possibility. Selecting variables is a reasonably straightforward method; it is sensitive of the correlation between the variables that may cause issues in the analysis, with a vast number of variables (Zhang et al, 2017)

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call