Abstract
Classification is an important tool with many useful applications. Fisher's linear discriminant analysis (LDA) is a traditional model‐based classification method which makes use of the Gaussian distributional information. However, in the high‐dimensional, low‐sample‐size setting,LDAcannot be directly deployed because the sample covariance is not invertible. While there are modern methods for high‐dimensional data, they may not fully use the information asLDAdoes. Hence in some situations, it is still desirable to use a model‐based method for classification. This paper exploits the potential ofLDAin a more complicated data setting. In many real applications, it is costly to manually place labels on observations; consequently, often only a small portion of labeled data is available while a large number of observations are left without labels. It is a great challenge to obtain good classification performance through the labeled data alone, especially in the high‐dimensional setting. In order to overcome this issue, we propose a semisupervised sparseLDAclassifier to take advantage of the seemingly useless unlabeled data, which helps to boost the classification performance in some situations. A direct estimation method is used to reconstructLDAand achieve sparsity; meanwhile we employ the difference‐convex algorithm to handle the nonconvex loss function associated with the unlabeled data. Theoretical properties of the proposed classifier are studied. Our simulated examples help understand when and how the information extracted from the unlabeled data can be useful. A real data example further illustrates the usefulness of the proposed method.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Statistical Analysis and Data Mining: The ASA Data Science Journal
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.