In this research paper, a fast, quantitative, analytical model for magnesium oxide (MgO) content in medicinal mineral talcum was explored based on near-infrared (NIR) spectroscopy. MgO content in each sample was determined by ethylenediaminetetraacetic acid (EDTA) titration and taken as reference value of NIR spectroscopy, and then a variety of processing methods of spectra data were compared to establish a good NIR spectroscopy model. To start, 50 batches of talcum samples were categorized into training set and test set using the Kennard-Stone (K-S) algorithm. In a partial least squares regression (PLSR) model, both leave-one-out cross-validation (LOOCV) and training set validation (TSV) were used to screen spectrum preprocessing methods from multiplicative scatter correction (MSC), and finally the standard normal variate transformation (SNV) was chosen as the optimal pretreatment method. The modeling spectrum bands and ranks were optimized using PLSR method, and the characteristic spectrum ranges were determined as 11995-10664, 7991-6661, and 4326-3999 cm-1, with four optimal ranks. In the support vector machine (SVM) model, the radical basis function (RBF) kernel function was used. Moreover, the full spectrum data of samples pretreated with SNV, the characteristic spectrum data screened using synergy interval partial least squares (SiPLS), and the scoring data of the first four ranks obtained by a partial least squares (PLS) dimension reduction of characteristic spectrum were taken as input variables of SVM, and the MgO content reference values of various sample were taken as output values. In addition, the SVM model internal parameters were optimized using the grid optimization method (GRID), particle swarm optimization (PSO), and genetic algorithm (GA) so that the optimal C and g-values were determined and the validation model was established. By comprehensively comparing the validation effects of different models, it can be concluded that the scoring data of the first four ranks obtained by PLS dimension reduction of characteristic spectrum were taken as input variables of SVM, and the PLS-SVM regression model established using GRID was the optimal NIR spectroscopy quantitative model of talc. This PLS-SVM regression model (rank = 4) measured that the MgO content of talcum was in the range of 17.42-33.22%, with root mean square error of cross validation (RMSECV) of 2.2127%, root mean square error of calibration (RMSEC) of 0.6057%, and root mean square error of prediction (RMSEP) of 1.2901%. This model showed high accuracy and strong prediction capacity, which can be used for rapid prediction of MgO content in talcum.
Read full abstract