Abstract

This paper studies the dimension effect of the linear discriminant analysis (LDA) and the regularized linear discriminant analysis (RLDA) classifiers for large dimensional data where the observation dimension $p$ is of the same order as the sample size $n$. More specifically, built on properties of the Wishart distribution and recent results in random matrix theory, we derive explicit expressions for the asymptotic misclassification errors of LDA and RLDA respectively, from which we gain insights of how dimension affects the performance of classification and in what sense. Motivated by these results, we propose adjusted classifiers by correcting the bias brought by the unequal sample sizes. The bias-corrected LDA and RLDA classifiers are shown to have smaller misclassification rates than LDA and RLDA respectively. Several interesting examples are discussed in detail and the theoretical results on dimension effect are illustrated via extensive simulation studies.

Highlights

  • Discriminant analysis that aims to allocate objects into one of the predefined classes has been an important topic in statistical learning and data analysis

  • The problem of how the high dimensionality would affect the classification accuracy of linear discriminant analysis (LDA) and regularized linear discriminant analysis (RLDA) when the observation dimension p is of the same order as the sample size n is still not well understood and in this work we provide a comprehensive analysis of the misclassification error rates under mild conditions

  • This paper aims at providing theoretical studies on the dimension effect of LDA and RLDA, from which we gain insights of how the increasing dimension affects the classification accuracy of LDA and RLDA

Read more

Summary

Introduction

Discriminant analysis that aims to allocate objects into one of the predefined classes has been an important topic in statistical learning and data analysis. The results were based on the random effects hypothesis which is a strong assumption Among these existing literatures, the problem of how the high dimensionality would affect the classification accuracy of LDA and RLDA when the observation dimension p is of the same order as the sample size n is still not well understood and in this work we provide a comprehensive analysis of the misclassification error rates under mild conditions. To study the dimension effect of RLDA, the key question is to study the asymptotic limits of the moments of a class of random matrices and their random quadratic forms involved of the population means The former one is studied by Ledoit and Peche (2011), Chen et al (2011) and our previous work Wang et al (2015).

Linear discriminant analysis
Effect of the sample mean
Effect of the sample covariance matrix
Dimension effect of LDA
Regularized linear discriminant analysis
Dimension effect of RLDA
Bias correction for RLDA
Selection of λ
Examples
Isotropic case
Sparse case
Simulations
Bias correction for LDA and RLDA
RLDA and sparse LDA
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.