Abstract

In multivariate data analysis such as principal components analysis (PCA) and projections to latent structures (PLS), it is essential that the training set systems (objects) are selected to provide data with substantial information for model parametrization, and to represent properly any future situations where the multilvariate model is used for predictions. In the framework of multivariate projections (PCA, SIMCA and PLS), elementary concepts of statistical design (fractional factorials and composite designs) can be used with the latent variables (PC or PLS scores) as design variables. The plan of action thus becomes: (1) problem formulation (specify aim and model, make a conceptual division of the investigated system into subsystems); (2) collection of multivariate data for each type of subsystems; (3) estimation of the practical dimensionality of the data for each type of subsystems by PC or PLS analysis; (4) use of the PC or PLS scores ( t) as design variables in the combination of subsystems to systems in the training set; (5) measurement of responses ( Y); (6) analysis of data by PCA or PLS; (7) interpretation of results with possible feedback to steps 1, 2 or 3. The procedures are illustrated by two problems: a structure/activity relationship for a family of peptides, and optimization of an organic synthesis with respect to system variables (solvent, substrate, co-reactant_) and process variables (temperature, reactant concentrations).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call