The use of Fourier transformed infrared spectroscopy (FTIR) and multivariate analysis for sample classification has been extensively explored in the literature. The overall accuracy obtained in most of the studies raises questions about their reliability due to technical limitations and misuse of the analysis driving to overfitting/underfitting of data. There are established procedures to avoid overfitting/underfitting, but there is a lack of studies to understand the relationship between the limitation of the data provided by the FTIR and the associated method of analysis for sample classification. In this study, FTIR spectra were obtained from thick films of poly (vinyl alcohol) (PVA)/poly (vinyl pyrrolidone) (PVP) polymeric blends, with PVP concentrations below 3% (wt./wt.). We analyzed the FTIR spectra by using simple algorithms for sample classification to evaluate the predicting model accuracy at low concentrations of PVP into the PVA matrix. The raw data were submitted to pre-treatment by standard normal variate (SNV), and different spectral ranges were explored by principal component analysis (PCA). Then, PCA and FTIR-SNV data were used for supervised analysis by using linear discriminant analysis (LDA); k-nearest neighbors (KNN); and support vector machines (SVM) algorithms. The method was able to classify samples with 0.1 wt% of PVP with an overall accuracy of 100% with quadratic SVM by the proper choice of the spectral range and number of PCs. Finally, we show that the FTIR associated with multivariate analysis can be used for sample classification at low concentration changes.
Read full abstract