Abstract Starch content in cattle feces can be used to predict starch digestibility. Near-infrared spectroscopy (NIRS) can be used to rapidly and non-chemically predict fecal starch content after a calibration model has been consequentially developed. Recent advancements in machine learning have introduced potent methods for developing optimized models. In addition, diverse preprocessing methods, such as scatter correction and smoothing, are available to eliminate irrelevant variance sources. This study aimed to identify optimal spectral preprocessing and regression methods for predicting fecal starch content using NIR spectra. NIR spectra were measured for 196 fecal samples collected from 9 fattening cattle farms. Fecal starch content on a DM basis was measured using enzymatic reactions and colorimetry. The 196 samples were divided into a calibration set (n = 123), a validation set (n = 29) comprising four farms, and an external validation set (n = 44) comprising five farms that differed from the calibration set. Based on the calibration set, two preprocessing methods were selected using the partial least squares (PLS) model. In method 1, the calibration models were developed by a grid search 5-fold cross validation on the 1,152 preprocessed calibration sets, and preprocessing with the lowest root mean square error (RMSE) of cross-validation among these calibration sets was selected. In method 2, the original calibration set (n = 123) was divided into a calibration set (n = 116) and a preprocessing selection set (n = 7) with different neutral detergent fiber and similar starch contents. After preprocessing, calibration models were developed as described in method 1. The preprocessing selection sets were then predicted by the developed models, and preprocessing with the lowest RMSE among these preprocessing selection sets was selected. Using the preprocessing methods selected in methods 1 and 2, the validation and external validation sets were predicted using seven regression models. As a result, the preprocessing selected by method 1 was in the wavelength range of 400–2500 nm, with robust normal variate (RNV) scatter correction and first-order derivative Savitzky-Golay filtering (2nd order polynomial, 2-nm window size). The preprocessing selected by method 2 had a wavelength range of 1500–2500 nm, RNV scatter correction, and 2nd-order derivative Savitzky-Golay filtering (3rd order polynomial, 25-nm window size). Among the seven regression models, the Lasso model was the most accurate for prediction. The Lasso models based on the preprocessing methods selected by methods 1 and 2 had RMSE of external validation = 2.054% of DM and 1.540% of DM, respectively. The feature importance of the Lasso model based on preprocessing method selected by methods 2 was higher at wavelengths of 1775 nm and 2325 nm. In conclusion, seven models were developed and evaluated based on diverse preprocessing methods using NIRS. The Lasso model exhibited the highest accuracy.
Read full abstract