Abstract

Many contemporary studies involve the classification of a subject into two classes based on n observations of the p variables associated with the subject. Under the assumption that the variables are normally distributed, the well-known linear discriminant analysis (LDA) assumes a common covariance matrix over the two classes while the quadratic discriminant analysis (QDA) allows different covariance matrices. When p is much smaller than n, even if they both diverge, the LDA and QDA have the smallest asymptotic misclassification rates for the cases of equal and unequal covariance matrices, respectively. However, modern statistical studies often face classification problems with the number of variables much larger than the sample size n, and the classical LDA and QDA can perform poorly. In fact, we give an example in which the QDA performs as poorly as random guessing even if we know the true covariances. Under some sparsity conditions on the unknown means and covariance matrices of the two classes, we propose a sparse QDA based on thresholding that has the smallest asymptotic misclassification rate conditional on the training data. We discuss an example of classifying normal and tumor colon tissues based on a set of p = 2; 000 genes and a sample of size n = 62, and another example of a cardiovascular study for n = 222 subjects with p = 2; 434 genes. A simulation is also conducted to check the performance of the proposed method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.