Simultaneous Sparse Estimation of Canonical Vectors in the p ≫ N Setting

Irina Gaynanova,James G Booth,Martin T Wells

doi:10.1080/01621459.2015.1034318

Abstract

ABSTRACTThis article considers the problem of sparse estimation of canonical vectors in linear discriminant analysis when p ≫ N. Several methods have been proposed in the literature that estimate one canonical vector in the two-group case. However, G − 1 canonical vectors can be considered if the number of groups is G. In the multi-group context, it is common to estimate canonical vectors in a sequential fashion. Moreover, separate prior estimation of the covariance structure is often required. We propose a novel methodology for direct estimation of canonical vectors. In contrast to existing techniques, the proposed method estimates all canonical vectors at once, performs variable selection across all the vectors and comes with theoretical guarantees on the variable selection and classification consistency. First, we highlight the fact that in the N > p setting the canonical vectors can be expressed in a closed form up to an orthogonal transformation. Secondly, we propose an extension of this form to the p ≫ N setting and achieve feature selection by using a group penalty. The resulting optimization problem is convex and can be solved using a block-coordinate descent algorithm. The practical performance of the method is evaluated through simulation studies as well as real data applications. Supplementary materials for this article are available online.

Highlights

Recent technological advances have generated high-dimensional data sets across a wide variety of application areas such as finance, atmospheric science, astronomy, biology and medicine
Linear Discriminant Analysis (LDA) is a popular classification and data visualization tool that is used in the N p setting
These linear combinations are called canonical vectors and they provide a low-dimensional representation of the data by reducing the original feature space dimension p to G − 1, where G is the total number of groups

Summary

Introduction

Recent technological advances have generated high-dimensional data sets across a wide variety of application areas such as finance, atmospheric science, astronomy, biology and medicine. Do these data sets provide computational challenges, but they breed new statistical challenges as the traditional methods no longer sufficient. LDA seeks the linear combinations of features that maximize Between Group Variability with respect to Within Group Variability (Mardia et al, 1979, Chapter 11) These linear combinations are called canonical vectors and they provide a low-dimensional representation of the data by reducing the original feature space dimension p to G − 1, where G is the total number of groups. Other approaches that lead to sparse discriminant vectors have been considered. Tibshirani et al (2002) propose the shrunken centroids methodology by adapting the naive Bayes classifier and soft-thresholding the mean vectors. Guo et al (2007) combine the shrunken centroids approach with a ridge-type penalty on the within-class covariance matrix. Witten and Tibshirani (2011) apply an 1 penalty to the Fisher’s discriminant problem in order to obtain sparse discriminant vectors. Clemmensen et al (2011) use an optimal scoring approach which essentially reduces the sparse discriminant vector construction to a penalized regression problem

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of the American Statistical Association	Publication Date: Apr 2, 2016
Citations: 31	License type: cc-by

R Discovery Prime

R Discovery Prime

Simultaneous Sparse Estimation of Canonical Vectors in the p ≫ N Setting

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of the American Statistical Association

Lead the way for us

Similar Papers

On stability of compositional canonical variate vector components
R A Reyment
Geological Society, London, Special Publications | VOL. 264
R A ReymentR A Reyment
01 Jan 2006
Geological Society, London, Special Publications | VOL. 264

Improved estimation of canonical vectors in canonical correlation analysis
Nicholas Asendorf ... Raj Rao Nadakuditi
-
Nicholas Asendorf, et. al.Nicholas Asendorf ... Raj Rao Nadakuditi
01 Nov 2015
01 Nov 2015

Expository Research Papers
Ilse Ipsen
SIAM Review | VOL. 53
Ilse IpsenIlse Ipsen
01 Jan 2010
SIAM Review | VOL. 53

Allozyme markers in breeding zone designation
R D Westfall ... M T Conkle
-
R D Westfall, et. al.R D Westfall ... M T Conkle
01 Jan 1992
01 Jan 1992

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Simultaneous Sparse Estimation of Canonical Vectors in the p ≫ N Setting

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of the American Statistical Association