Abstract

Linear discriminant analysis (LDA) based classifiers tend to falter in many practical settings where the training data size is smaller than, or comparable to, the number of features. As a remedy, different regularized LDA (RLDA) methods have been proposed. These methods may still perform poorly depending on the size and quality of the available training data. In particular, the test data deviation from the training data model, for example, due to noise contamination, can cause severe performance degradation. Moreover, these methods commit further to the Gaussian assumption (upon which LDA is established) to tune their regularization parameters, which may compromise accuracy when dealing with real data. To address these issues, we propose a doubly regularized LDA classifier that we denote as R2LDA. In the proposed R2LDA approach, the RLDA score function is converted into an inner product of two vectors. By substituting the expressions of the regularized estimators of these vectors, we obtain the R2LDA score function that involves two regularization parameters. To set the values of these parameters, we adopt three existing regularization techniques; the constrained perturbation regularization approach (COPRA), the bounded perturbation regularization (BPR) algorithm, and the generalized cross-validation (GCV) method. These methods are used to tune the regularization parameters based on linear estimation models, with the sample covariance matrix's square root being the linear operator. Results obtained from both synthetic and real data demonstrate the consistency and effectiveness of the proposed R2LDA approach, especially in scenarios involving test data contaminated with noise that is not observed during the training phase.

Highlights

  • The idea of linear discriminant analysis (LDA) was originally conceived by Fisher [1] and is based on the assumption that the data follows a Gaussian distribution with a common class covariance matrix

  • This paper develops a doubly regularized LDA (R2LDA) classifier by expressing the LDA score function as an inner product of two vectors that are linearly related to the mean vectors and the data covariance matrix

  • We summarize our main innovations and the most prominent features of the proposed R2LDA approach as follows: (a) We deviate from the classical covariance matrix estimation approach to regularized LDA (RLDA), where the focus is to obtain a regularized linear estimator of the data covariance matrix

Read more

Summary

INTRODUCTION

The idea of linear discriminant analysis (LDA) was originally conceived by Fisher [1] and is based on the assumption that the data follows a Gaussian distribution with a common class covariance matrix. The performance of LDA-based classifiers depends heavily on accurate estimation of the class statistics, namely, the sample covariance matrix and class mean vectors. These statistics can be estimated with fairly high accuracy when the number of available samples is large compared to the data dimensionality. In all the above-mentioned RLDA approaches, a regularization parameter is tuned based only on the data available in the training phase. The above-mentioned approaches’ performance might deteriorate significantly These methods use the Gaussian assumption of the underlying data distribution for finding the value of the regularization parameter.

RLDA CLASSIFICATION
REGULARIZATION PARAMETER SELECTION
SUMMARY OF THE PROPOSED R2LDA APPROACH
PERFORMANCE EVALUATION
DATASETS DESCRIPTION Synthetic Data
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call