Abstract

The pls package implements principal component regression (PCR) and partial least squares regression (PLSR) in R (R Development Core Team 2006b), and is freely available from the Comprehensive R Archive Network (CRAN), licensed under the GNU General Public License (GPL). The user interface is modelled after the traditional formula interface, as exemplified by lm. This was done so that people used to R would not have to learn yet another interface, and also because we believe the formula interface is a good way of working interactively with models. It thus has methods for generic functions like predict, update and coef. It also has more specialised functions like scores, loadings and RMSEP, and a exible crossvalidation system. Visual inspection and assessment is important in chemometrics, and the pls package has a number of plot functions for plotting scores, loadings, predictions, coefficients and RMSEP estimates. The package implements PCR and several algorithms for PLSR. The design is modular, so that it should be easy to use the underlying algorithms in other functions. It is our hope that the package will serve well both for interactive data analysis and as a building block for other functions or packages using PLSR or PCR. We will here describe the package and how it is used for data analysis, as well as how it can be used as a part of other packages. Also included is a section about formulas and data frames, for people not used to the R modelling idioms.

Highlights

  • Multivariate regression methods like principal component regression (PCR) and partial least squares regression (PLSR) enjoy large popularity in a wide range of fields, including the natural sciences

  • PLSR should have an advantage over PCR

  • Put the other way around: with the same number of latent variables, PLSR will cover more of the variation in Y and PCR will cover more of X

Read more

Summary

Introduction

Multivariate regression methods like principal component regression (PCR) and partial least squares regression (PLSR) enjoy large popularity in a wide range of fields, including the natural sciences. For example, one likes to derive molecular properties from the molecular structure Most of these quantitative structure–activity relations (QSAR, and quantitative structure–property relations, QSPR), and in particular, comparative molecular field analysis (ComFA) (Cramer, Patterson, and Bunce 1988), use PLSR. The problem often is that X X is singular, either because the number of variables (columns) in X exceeds the number of objects (rows), or because of collinearities Both PCR and PLSR circumvent this by decomposing X into orthogonal scores T and loadings P. and regressing Y not on X itself but on the first a columns of the scores T. Other PLSR algorithms give identical results to SIMPLS in the case of one Y variable, but deviate slightly for the multivariate Y case; the differences are not likely to be important in practice

Algorithms
On the use of PLSR and PCR
Outline of the paper
Example session
Formulas and data frames
Formulas
Data frames
Fitting models
Choosing the number of components with cross-validation
Plotting
Extraction
Summaries
Predicting new observations
Selecting fit algorithms
Package design
Calling fit functions directly
Formula handling in more detail
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call