The use of milk Fourier transform mid-infrared (FT-MIR) spectrometry to develop management and breeding tools for dairy farmers and industry is growing and supported by the availability of numerous new predicted phenotypes to assess the nutritional quality of milk and its technological properties, but also the animal health and welfare status and its environmental fingerprint. For genetic evaluations, having a long-term and representative spectral dairy herd improvement (DHI) database improves the reliabilities of estimated breeding values (EBV) from these phenotypes. Unfortunately, most of the time, the raw spectral data used to generate these estimations are not stored. Moreover, many reference measurements of those phenotypes, needed during the FT-MIR calibration step, are available from past research activities but lack spectra records. So, it is impossible to use them to improve the FT-MIR models. Consequently, there is a strong interest in imputing those missing spectra. The innovative objective of this study was to use the existing large spectral DHI database to estimate missing spectra by selecting probable spectra using, as the match criteria, common dairy traits recorded for a long time by DHI organizations. We tested 4 match criteria combinations. Combination 1 required to have equal fat and protein contents between the sample for which a spectrum was to be estimated and the reference samples in the DHI database. Combination 2 also required an equal urea content. Combination 3 requested equal fat, protein, and lactose contents. Finally, combination 4 included all criteria. When more than one spectrum was found during the search, their average was the estimated spectrum for the query sample. Concretely, this study estimated missing spectra for 1,700 samples using 2,000,000 spectral DHI records. For assessing the effect of this spectral estimation on the prediction quality, FT-MIR equations were used to predict 11 phenotypes, selected as their quantification used different FT-MIR regions. They were related to the milk fat and mineral composition, lactoferrin content, quantity of eructed methane, body weight (BW), and dry matter intake. The accuracy between predictions obtained from actual and estimated spectra was evaluated by calculating the mean absolute error (MAE). The criteria in the fourth and second combinations were too strict to estimate a spectrum for most samples. Indeed, for many samples, no spectra with the same values for those matching criteria was found. The third match criteria combination had a poorer prediction performance for all studied traits and spectral absorptions than the first combination due to fewer matched samples available to compute the missing spectrum. By allowing a range for matching lactose content (±0.1 g/dL milk), we showed that this new combination increased the number of selected samples to compute missing spectra and predict better the infrared absorption at different wavenumbers, especially those related to the lactose quantification. The prediction performance was further improved by performing queries on the entire Walloon DHI spectral database (6,625,570 spectra), and it varied among the studied phenotypes. Without considering the traits used for the matching, the best predictions were obtained for the content of saturated fatty acids (MAE = 0.15 g/dL milk) and BW (MAE = 12.80 kg). Yet, the predictions for the unsaturated fatty acids were less accurate (MAE = 0.13 and 0.018 g/dL milk for monounsaturated and polyunsaturated fatty acids), likely because of the poorer predictions of spectral regions related to long-chain fatty acids. Similarly, poorer predictions were observed for the amount of methane eructed by dairy cows (MAE = 47.02 g/d), likely because it is not directly related to fat content or composition. Prediction accuracies for the remaining traits were also low. In conclusion, we observed that increasing the number of relevant matching criteria helps improve the quality of FT-MIR predicted phenotypes and the number of spectra used during the search. So, it would be of great interest to test in the future the suitability of the developed methodology with large-scale international spectral databases to improve the reliability of EBV from these FT-MIR-based phenotypes and the robustness of FT-MIR predictive models.
Read full abstract