Abstract
Partial least squares discriminant analysis (PLS-DA) is a well-known technique for feature extraction and discriminant analysis in chemometrics. Despite its popularity, it has been observed that PLS-DA does not automatically lead to extraction of relevant features. Feature learning and extraction depends on how well the discriminant subspace is captured. In this paper, discriminant subspace learning of chemical data is discussed from the perspective of PLS-DA and a recent extension of PLS-DA, which is known as the locality preserving partial least squares discriminant analysis (LPPLS-DA). The objective is twofold: (a) to introduce the LPPLS-DA algorithm to the chemometrics community and (b) to demonstrate the superior discrimination capabilities of LPPLS-DA and how it can be a powerful alternative to PLS-DA. Four chemical data sets are used: three spectroscopic data sets and one that contains compositional data. Comparative performances are measured based on discrimination and classification of these data sets. To compare the classification performances, the data samples are projected onto the PLS-DA and LPPLS-DA subspaces, and classification of the projected samples into one of the different groups (classes) is done using the nearest-neighbor classifier. We also compare the two techniques in data visualization (discrimination) task. The ability of LPPLS-DA to group samples from the same class while at the same time maximizing the between-class separation is clearly shown in our results. In comparison with PLS-DA, separation of data in the projected LPPLS-DA subspace is more well defined.
Highlights
With the recent advances in technology, there has been an explosion in the amount of chemical data generated using advanced chemical analysis equipment
partial least squares discriminant analysis (PLS-DA) is a well-known technique for feature extraction and discriminant analysis in the context of chemometrics.[6−10] This method is based on the PLS algorithm, which was first introduced for regression task.[11,12]
The performances of LPPLSDA and PLS-DA methods are compared in two ways: 1. Visualization: The PLS-DA scores and the LPPLS-DA scores in the low-dimensional subspace are plotted in order to evaluate the discriminant capability of the methods
Summary
With the recent advances in technology, there has been an explosion in the amount of chemical data generated using advanced chemical analysis equipment These types of data sets possess characteristics such as high dimensionality and small sample size, which make classification and discrimination tasks quite challenging. A lot of techniques have been proposed in the past to reduce the dimensionality of the data by either selecting the most representative features from the original ones (feature selection) or by creating new features as linear combinations of the original features (feature extraction) These techniques include principal component analysis (PCA)[2,3] and partial least squares discriminant analysis (PLS-DA)[2,4,5] to mention a few. The transformation is readily computed using the nonlinear iterative partial least squares (NIPALS) algorithm.[11,13]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.