Abstract

One of the main problems of quantitative analytical chemistry is to estimate the concentration of one or more species from the values of certain physicochemical properties of the system of interest. For this it is necessary to construct a calibration model, i.e., to determine the relationship between measured properties and concentrations. The multivariate calibration is one of the most successful combinations of statistical methods to chemical data, both in analytical chemistry and in theoretical chemistry. Among used methods can cite Artificial Neural Networks (ANN), the Nonlinear Partial Least Squares (N-PLS), Principal Components Regression (PCR) and Multiple Linear Regression (MLR). In addition of multivariate calibration methods algorithms of samples selection are used. These algorithms choose a subset of samples to be used in training set covering adequately the space of the samples. In other hand, a large spectrum of a sample is typically measured by modern scanning instruments generating hundreds of variables. Search algorithms have been used to identify variables which contribute useful information about the dependent variable in the model. This paper proposes a Genetic Algorithm based on Double Chromosome (GADC) to do these tasks simultaneously, the sample and variable selection. The obtained results were compared with the well-known algorithms for samples and variable selection Kennard-Stone, Partial Least Square and Successive Projection Algorithm. We showed that the proposed algorithm can obtain better calibrations models in a case study involving the determination of content protein in wheat samples.

Highlights

  • The term multivariate calibration refers to the construction of a mathematical model to estimate a quantity of interest on the basis of measured values of a set of explanatory variables (Soares et al, 2014; De Paula et al, 2014; Soares et al, 2010b)

  • Among the traditional technics for construct this model, we can cite the Multiple Linear Regression (MLR) where the data are modelled using linear predictor functions and unknown model parameters are estimated from the data

  • The data set for multivariate calibration study is the same used by (Soares et al, 2010a), that consists of 755 visible near infrared spectra of whole-kernel wheat samples, which were initially used as shoot-out data in the 2008 International Diffuse Reflectance Conference and protein content is chosen as the property of interest

Read more

Summary

Introduction

The term multivariate calibration refers to the construction of a mathematical model to estimate a quantity of interest on the basis of measured values of a set of explanatory variables (Soares et al, 2014; De Paula et al, 2014; Soares et al, 2010b). Among the traditional technics for construct this model, we can cite the Multiple Linear Regression (MLR) where the data are modelled using linear predictor functions and unknown model parameters are estimated from the data. Given a data set: Y y1 = y2 ⋮ X x1,1 ⋯. A linear regression model assumes that the relationship between the dependent variable yn and the p-vector of regressors xn is linear.

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call