Evaluating the performance of dimensionality-reduction and rotation techniques in a latent variable framework
This work provides a systematic evaluation of how different low-dimensional representations of the chemical state space affect the accuracy and computational performance of a latent variable (LV) framework for reactive flows. These latent representations are constructed using Principal Component Analysis (PCA) and Partial Least Squares (PLS) combined with orthogonal and oblique rotation strategies, trained using zero-dimensional reactors with the detailed AramcoMech 1.3 mechanism for CH4/air combustion, and tested on unseen conditions. Performance is assessed in terms of solution accuracy, computational speed-up and metrics describing the latent space and system dynamics. Results show that while all latent-basis configurations achieve computational savings, their numerical behaviour strongly depends on the structure of the latent space. PCA-derived bases provide the lowest reconstruction errors but only a moderate speed-up. Unrotated PLS bases exhibit strong sensitivity to the latent dimension, whereas orthogonal rotations restore stability across compression levels. Among all techniques, the PLS–Quartimax combination provides the best overall performance, yielding smooth source-term manifolds and latent variables explicitly aligned with the fast chemical time scales due to the PLS formulation. This alignment enables larger stable time steps and results in the highest computational speed-up. These findings demonstrate that appropriate combinations of dimensionality-reduction techniques and rotation strategies significantly improve the numerical behaviour of the LV solver, enabling faster chemistry integration while preserving the full thermochemical state.
- Research Article
1
- 10.1088/1361-6501/adead0
- Jul 10, 2025
- Measurement Science and Technology
In modern industrial processes, quality-oriented monitoring presents significant challenges because of the complex process characteristics such as nonlinearity, dynamics, etc. Traditional statistical monitoring methods, such as canonical correlation analysis and partial least squares (PLS), which are widely used in industrial applications, often struggle to capture the nonlinear dynamic features among process variables. In this paper, a novel deep dynamic partial least squares (DeDPLS) based quality monitoring framework is proposed. By integrating hierarchical latent variable extraction with deep dynamic modelling, the framework enhances quality-related fault detection capability in industrial processes. First, a new dynamic latent variable extraction framework is developed by integrating dynamic PLS (DPLS) with residual subspace modeling. This framework effectively decouples latent variables into dynamic systematic and residual components, enabling a more detailed, hierarchical decomposition of process variations. Second, a multi-layer deep learning structure is constructed through stacking multiple DPLS modules, where kernel-mapped residual components propagate nonlinear features across layers, allowing iterative refinement of latent feature representations. Then, a hierarchical quality monitoring strategy is designed by fusing latent variable statistics across all layers. A global monitoring statistic is derived from the weighted contributions of each layer’s statistics. Finally, experimental validation on hot strip mill process demonstrates the effectiveness and superiority of the DeDPLS framework.
- Research Article
5
- 10.1088/1741-2552/ab5d47
- Feb 1, 2020
- Journal of Neural Engineering
Objective. Partial Least Squares (PLS) regression is a suitable linear decoder model for correlated and high dimensional neural data. This algorithm has been widely used in the application of brain–computer interface (BCI) for the decoding of motor parameters. PLS does not consider nonlinear relations between brain signal features. The nonlinear version of PLS that considers a nonlinear relationship between the latent variables has not been proposed for the decoding of intracranial data. This nonlinear model may cause overfitting in some cases due to a larger number of free parameters. In this paper, we develop a new version of nonlinear PLS, namely nonlinear sparse PLS (NLS PLS) and test it in BCI applications. Approach. In motor related BCI systems, improving the decoding accuracy of both kinetic and kinematic parameters of movement is crucial. To do this, two BCI datasets were chosen to decode the force amplitude and position of hand trajectory using the nonlinear and sparse versions of PLS algorithm. In our new NLS PLS method, we considered a polynomial relationship between the latent variables and used the lasso penalization in the latent space to avoid overfitting and to improve the decoding accuracy. Main results. Some linear and nonlinear based PLS models and our new proposed method, NLS PLS, were applied to the two datasets. According to our results, significant improvement from the NLS PLS method is confirmed over other methods. Our results show that nonlinear PLS outperforms generic PLS in the force decoding but it has lower accuracy in the hand trajectory decoding because of high dimensional feature space. By using lasso penalization, we presented a sparse nonlinear PLS-based model that outperforms generic PLS in both datasets and improves the coefficient of determination, 34% in the force decoding and 10% in the hand trajectory decoding. Significance. We constructed a simple PLS-based model that considers a nonlinear relationship between features and it is also robust to overfitting because of using the lasso penalty in the latent space. This model is suitable for a high dimensional and correlated datasets, like intracranial data and can improve the accuracy of estimation.
- Single Report
- 10.2172/2460467
- Sep 27, 2024
Multivariate approaches show promise for application to process monitoring for safeguards of pyroprocessing. Past MPACT work explored the application of Principal Component Analysis (PCA) to detect off-normal conditions in pyroprocessing electrorefiner (ER) data from in the Hot Fuel Examination Facility (HFEF) at Idaho National Laboratory (INL) known as the Scalable Pyrochemical Recycling testbed (SPyRe) ER. PCA, however, does not consider the output variables. In FY24, multivariate analysis was extended from PCA to Partial Least Squares (PLS) analysis. PLS maximizes the variance between both the input signals and output variables. In the case of this work, PLS was applied in two different manners: Predictive PLS and Discriminant PLS. Predictive PLS maximizes the covariance between the process variables of the ER and the measured U concentration from in-situ voltammetry. Discriminant PLS maximizes the covariance between the process variables and a set of training process “states” such as known off-normal conditions. By projecting into the latent variable space in PLS, the process variables can be regressed onto the outputs and predictions can be made for new data sets. In this work, by applying predictive PLS, a penalized non-linear PLS approach was able to make predictions of concentration based on test and training data and detect when operations were off-normal. However, the predictive PLS does not classify the signals to which off-normal operations are attributable. Discriminant PLS can be used to classify off-normal operations but is inadequate to properly classify specific off-normal classes like power supply faults when the Discriminant PLS model is only specifically trained to detect that off-normal class. When all faults are trained against the observation data, all three operational classes are accurately classified and distinguished. Thus, future application of latent variable techniques should not select any given method, but should use a mixture of PCA, Predictive PLS, and Discriminant PLS.
- Book Chapter
432
- 10.1007/978-1-62703-059-5_23
- Aug 18, 2012
Partial least square (PLS) methods (also sometimes called projection to latent structures) relate the information present in two data tables that collect measurements on the same set of observations. PLS methods proceed by deriving latent variables which are (optimal) linear combinations of the variables of a data table. When the goal is to find the shared information between two tables, the approach is equivalent to a correlation problem and the technique is then called partial least square correlation (PLSC) (also sometimes called PLS-SVD). In this case there are two sets of latent variables (one set per table), and these latent variables are required to have maximal covariance. When the goal is to predict one data table the other one, the technique is then called partial least square regression. In this case there is one set of latent variables (derived from the predictor table) and these latent variables are required to give the best possible prediction. In this paper we present and illustrate PLSC and PLSR and show how these descriptive multivariate analysis techniques can be extended to deal with inferential questions by using cross-validation techniques such as the bootstrap and permutation tests.
- Research Article
27
- 10.14214/sf.57
- Jan 1, 2012
- Silva Fennica
Partial Least Square (PLS) regression is a recent soft-modelling technique that generalizes and combines features from principal component analysis (PCA) and multiple regression. It is particularly useful when predicting one or more dependent variables from a large set of independent variables, often collinear. The authors compared the potential of PLS regression and ordinary linear regression for accurate modelling of forest work, with special reference to wood chipping, wood extraction and the continuous harvesting of short rotation coppice. Compared to linear regression, PLS regression allowed producing models that better fit the original data. What is more, it allowed handling collinear variables, facilitating the extraction of sound models from large amounts of field data obtained from commercial forest operations. On the other hand, PLS regression analysis is not as easy to conduct, and produces models that are less user-friendly. By producing alternative models, PLS regression may provide additional â and not alternative â ways of reading the data. Ideally, a comprehensive data analysis could include both ordinary and PLS regression and proceed from their results in order to get a better understanding of the phenomenon under examination. Furthermore, the computational complexity of PLS regression may stimulate interdisciplinary team-building, to the greater benefit of scientific research within the field of forest operations.
- Research Article
25
- 10.1016/j.chemolab.2023.104827
- Apr 18, 2023
- Chemometrics and Intelligent Laboratory Systems
A novel regression method: Partial least distance square regression methodology
- Research Article
96
- 10.1039/c1ja10164a
- Jan 1, 2012
- J. Anal. At. Spectrom.
The objective of the current research was to compare different data-driven multivariate statistical predictive algorithms for the quantitative analysis of Fe content in iron ore measured using Laser-Induced Breakdown Spectroscopy (LIBS). The algorithms investigated were Principal Components Regression (PCR), Partial Least Squares Regression (PLS), Multi-Block Partial Least Squares (MB-PLS), and Serial Partial Least Squares Regression (S-PLS). Particular emphasis was placed on the issues of the selection and combination of atomic spectral data available from two separate spectrometers covering 208–222 nm and 300–855 nm ranges, which include many of the spectral features of interest. Standard PLS and PCR models produced similar prediction accuracy, although in the case of PLS there were notably less latent variables in use by the model. It was further shown that MB-PLS and S-PLS algorithms which both treated available UV and VIS data blocks separately, demonstrated inferior performance in comparison with both PCR and PLS.
- Research Article
8
- 10.6092/unina/fedoa/4216
- Nov 30, 2009
- Università degli Studi di Napoli Federico II
Partial Least Squares (PLS) methods embrace a suite of data analysis techniques based on algorithms belonging to PLS family. These algorithms consist in various extensions of the Nonlinear estimation by Iterative PArtial Least Squares (NIPALS) algorithm, which was proposed by Herman Wold as an alternative algorithm for implementing a Principal Component Analysis. The peculiarity of this algorithm is that it calculates principal components by means of an iterative sequence of simple ordinary least squares regressions. This feature allows overcoming computational problems due to missing data or landscape data matrices, i.e. matrix having more columns than rows. PLS methods were born to handle data sets forming metric spaces. This involves that all the variables embedded in the analysis are observed on interval or ratio scales. In this work we evidenced how NIPALS based algorithms, properly adjusted, can work as optimal scaling algorithms. This new feature of PLS, which had been until now totally unexplored, allowed us to device a new suite of PLS methods: the Non-Metric PLS (NM-PLS) methods. NM-PLS methods can be used with different aims: - to analyze at the same time variables observed on different measurement scales; - to investigate non linearity; - to discard the hard assumption of linearity in favor of a milder assumption of monotonicity. In particular, these methods generalize standard NIPALS, PLS Regression and PLS Path Modeling in such a way to handle variables observed on a variety of measurement scales, as well as to cope with non linearity problems. Three new algorithms are been proposed to implement NM-PLS methods: the Non-Metric NIPALS algorithm, the Non-Metric PLS Regression algorithm, and the Non-Metric PLS Path Modeling algorithm. All these algorithms provide at the same time specific PLS model parameters as well as scaling values for variables to be scaled. Scaling values provided by these algorithms are been proved to be optimal, in the sense that they optimize the same criterion of the model in which they are involved. Moreover, they are suitable, since they respect the constraints depending on which among the properties of the original measurement scale we want to preserve.
- Research Article
31
- 10.1016/j.jfca.2020.103509
- May 18, 2020
- Journal of Food Composition and Analysis
Rapid prediction of multiple wine quality parameters using infrared spectroscopy coupling with chemometric methods
- Research Article
5
- 10.1016/j.physb.2018.08.037
- Sep 7, 2018
- Physica B: Condensed Matter
Predictors Generation by Partial Least Square Regression for microwave characterization of dielectric materials
- Dissertation
3
- 10.53846/goediss-5233
- Jan 1, 2015
A composite index is an aggregated variable comprising individual indicators and weights that commonly represent the relative importance of each indicator. Composite indices are often used to measure latent phenomena or to summarize complex information in a small number of variables. It is crucial to choose correct weights for the variables that build a composite index. Principal Component Analysis (PCA) is a popular approach to derive weights, but it may not work when informative variations account for only small variances in the variables in a composite index. Therefore, this study proposes to use Partial Least Squares (PLS), which takes advantages of the relationship between outcome variables and the variables in a composite index. Our simulation study shows that PLS performs either as good as PCA or significantly outperforms it. Additionally, in practice variables that enter a composite index are often non-metric, which require special treatments to apply PCA or PLS. This study reviews various PCA and PLS algorithms for non-metric variables available in the literature and compares them by means of extensive simulation studies to make recommendations for practitioners. Dummy coding shows often satisfactory performance compared to more sophisticated methods. As our applications wealth, globalization, gender inequality and corruption are quantified using composite indices based on PCA and PLS, by which PLS generates composite indices tailored to each respective outcome variable showing often better performance compared to PCA. A comparison between PCA and PLS weights and coefficients shows which variables are particularly relevant for each respective outcome variable.
- Conference Article
1
- 10.1063/1.4953967
- Jan 1, 2016
- AIP conference proceedings
Bureaucracy condition in Indonesia reveals many shortcomings. One of bureaucratic reformation from the government is the remuneration for Civil Servants (PNS). Remuneration is a part of welfare received by employees, which can be used as an element of motivation for employees to excel and improve their performance. Variables in this study are interrelated. Motivation for achievement (ξ1), characteristics of work environment (ξ2) and training transfer (ξ3) are supposedly affect the performance (η1), while the performance (η1) affects the remuneration (η2). Both the performance and remuneration are constructs or latent variables, which cannot be measured directly. Therefore, the SEM method is considered able to resolve these problems. However, SEM has some assumptions that must be met. The assumptions were frequently violated when real data is used, so we need a method that is free of assumptions, free distribution and flexible that is variance-based SEM or namely partial least square (PLS). PLS is an estimation method that focuses on maximizing the variance among latent variables, which is an alternative to OLS regression. This study was conducted to model the remuneration of educational staff in ITS by using Partial Least Square (PLS) with path scheme, centroid scheme, and factor scheme. The results show that the best method for modeling the remuneration of educational staff in ITS is PLS with factors scheme, which yields Q-square value of 0.7262, R-square value of 67.69 percent and 15.28 percent for performance and remuneration respectively. Structural model obtained with factors scheme PLS is η1 = 0,6296 ξ1 + 0,1795 ξ2 + 0,0843 ξ3 + ξ1 and η2 = 0,3909 η1 + ξ2.
- Research Article
97
- 10.1016/j.chemolab.2013.11.008
- Nov 27, 2013
- Chemometrics and Intelligent Laboratory Systems
Relationships between PCA and PLS-regression
- Research Article
21
- 10.1007/s13042-016-0500-8
- Feb 6, 2016
- International Journal of Machine Learning and Cybernetics
In reality, data objects often belong to several different categories simultaneously, which are semantically correlated to each other. Multi-label learning can handle and extract useful information from such kind of data effectively. Since it has a great variety of potential applications, multi-label learning has attracted widespread attention from many domains. However, two major challenges still remain for multi-label learning: high dimensionality and correlations of data. In this paper, we address the problems by using the technique of partial least squares (PLS) and propose a new multi-label learning method called rPLSML (regularized Partial Least Squares for Multi-label Learning). Specifically, we exploit PLS discriminant analysis to identify a latent and common space from the variable and label spaces of data, and then construct a learning model based on the latent space. To tackle the multi-collinearity problem raised from the high dimensionality, a \(\ell _2\)-norm penalty is further exerted on the optimization problem. The experimental results on public data sets show that rPLSML has better performance than the state-of-the-art multi-label learning algorithms.
- Research Article
11
- 10.3390/sym13040547
- Mar 26, 2021
- Symmetry
Multivariate statistical analysis such as partial least square regression (PLSR) is the common data processing technique used to handle high-dimensional data space on near-infrared (NIR) spectral datasets. The PLSR is useful to tackle the multicollinearity and heteroscedasticity problem that can be commonly found in such data space. With the problem of the nonlinear structure in the original input space, the use of the classical PLSR model might not be appropriate. In addition, the contamination of multiple outliers and high leverage points (HLPs) in the dataset could further damage the model. Generally, HLPs contain both good leverage points (GLPs) and bad leverage points (BLPs); therefore, in this case, removing the BLPs seems relevant since it has a significant impact on the parameter estimates and can slow down the convergence process. On the other hand, the GLPs provide a good efficiency in the model calibration process; thus, they should not be eliminated. In this study, robust alternatives to the existing kernel partial least square (KPLS) regression, which are called the kernel partial robust GM6-estimator (KPRGM6) regression and the kernel partial robust modified GM6-estimator (KPRMGM6) regression are introduced. The nonlinear solution on PLSR was handled through kernel-based learning by nonlinearly projecting the original input data matrix into a high-dimensional feature mapping that corresponded to the reproducing kernel Hilbert spaces (RKHS). To increase the robustness, the improvements on GM6 estimators are presented with the nonlinear PLSR. Based on the investigation using several artificial dataset scenarios from Monte Carlo simulations and two sets from the near-infrared (NIR) spectral dataset, the proposed robust KPRMGM6 is found to be superior to the robust KPRGM6 and non-robust KPLS.