Abstract
Model selection and feature selection are usually considered two separate tasks. For example, in a linear discriminant analysis (LDA) setting, a modeling assumption is typically made first (e.g., a full or a diagonal covariance matrix can be chosen) and then with this model the feature subset providing the best prediction performance is selected. If limited training data is available, then the number of parameters of a model that can be reliably estimated will also be limited. In the context of LDA, model selection basically entails simplifying the covariance matrix by setting to zero some of this components. This leads to different block diagonal matrix structures (e.g., full/diagonal) which involve different sets of features and require different parameters to be estimated. In this paper we argue that LDA feature and parameter selection should be done jointly; and we propose a greedy algorithm for joint selection of features and of a block diagonal structure for the covariance matrix. To the best of our knowledge this is the first time such a joint design has been proposed in the context of LDA. The choice of a block diagonal structure is motivated by microarray classification problems, where we have a very large amount of features, i.e., genes, that are expected to be corregulated in small groups. Results obtained with artificial datasets show that the algorithm can flexibly choose an adequate covariance matrix structure according to the size of the training set and the generating distribution. Our results consistently outperform those achieved with other LDA based techniques
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.