Abstract
The pls package implements principal component regression (PCR) and partial least squares regression (PLSR) in R (R Development Core Team 2006b), and is freely available from the Comprehensive R Archive Network (CRAN), licensed under the GNU General Public License (GPL). The user interface is modelled after the traditional formula interface, as exemplified by lm. This was done so that people used to R would not have to learn yet another interface, and also because we believe the formula interface is a good way of working interactively with models. It thus has methods for generic functions like predict, update and coef. It also has more specialised functions like scores, loadings and RMSEP, and a exible crossvalidation system. Visual inspection and assessment is important in chemometrics, and the pls package has a number of plot functions for plotting scores, loadings, predictions, coefficients and RMSEP estimates. The package implements PCR and several algorithms for PLSR. The design is modular, so that it should be easy to use the underlying algorithms in other functions. It is our hope that the package will serve well both for interactive data analysis and as a building block for other functions or packages using PLSR or PCR. We will here describe the package and how it is used for data analysis, as well as how it can be used as a part of other packages. Also included is a section about formulas and data frames, for people not used to the R modelling idioms.
Highlights
Multivariate regression methods like principal component regression (PCR) and partial least squares regression (PLSR) enjoy large popularity in a wide range of fields, including the natural sciences
PLSR should have an advantage over PCR
Put the other way around: with the same number of latent variables, PLSR will cover more of the variation in Y and PCR will cover more of X
Summary
Multivariate regression methods like principal component regression (PCR) and partial least squares regression (PLSR) enjoy large popularity in a wide range of fields, including the natural sciences. For example, one likes to derive molecular properties from the molecular structure Most of these quantitative structure–activity relations (QSAR, and quantitative structure–property relations, QSPR), and in particular, comparative molecular field analysis (ComFA) (Cramer, Patterson, and Bunce 1988), use PLSR. The problem often is that X X is singular, either because the number of variables (columns) in X exceeds the number of objects (rows), or because of collinearities Both PCR and PLSR circumvent this by decomposing X into orthogonal scores T and loadings P. and regressing Y not on X itself but on the first a columns of the scores T. Other PLSR algorithms give identical results to SIMPLS in the case of one Y variable, but deviate slightly for the multivariate Y case; the differences are not likely to be important in practice
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have