Accurate early diagnosis of pregnancy is important for timely reproductive management of dairy farms. Fourier-transform mid-infrared (FT-MIR) milk spectral data are routinely used for determining milk components such as fat and protein, whereas milk composition is known to change with advancing stages of pregnancy. The objectives of this study were to compare partial least squares discriminant analysis (PLS-DA) and a Bayesian variable selection regression model (BayesC) for the diagnosis of pregnancy status (PS) from milk FT-MIR data and to infer any spectral regions that might be highly associated with PS at various stages of pregnancy. Conception dates on confirmed pregnant cows were obtained from Holstein cows within 123 herds in Michigan, Ohio, and Indiana during 2018 and 2019. Milk samples from these pregnant cows at 7 different stages of pregnancy were case-control matched to open contemporary herd mates to be within the same stage (±10 d for days in milk) of lactation for the same milk sample test date. The FT-MIR data were obtained for all of these milk samples. Ten-fold herd-independent cross-validation was used to compare PLS-DA versus BayesC using the area under the receiver operating characteristic curve (AUC). The BayesC model demonstrated higher mean AUC compared with PLS-DA at all stages exceeding 60 d of pregnancy. The mean BayesC AUC at stage 1 (1-30 d) was 0.58 ± 0.02, which was superior to a random guess (AUC = 0.50) yet too low to be of practical use. The mean BayesC AUC at stage 7 (≥180 d) was 0.13 greater compared with that of stage 1 (1-30 d) and 0.07 to 0.10 greater compared with stages 2, 3, 4, 5, and 6 (31-180 d in 30-d increments). The mean AUC of stages 2 to 6 were 0.03 to 0.06 greater compared with stage 1 yet again too low to be of practical use. Because of high multicollinearity between many adjacent wavenumbers, a spatially constrained clustering algorithm was used to adaptively partition wavenumbers into 68 windows before inferring associations of spectral regions with pregnancy. Pregnancy status was highly associated with wavenumber windows 1,063 to 1,134 cm-1, 1,201 to 1,257 cm-1, and 1,260 to 1,432 cm-1 based on an estimated BayesC posterior probability of association (PPA) approaching 100% for each of these windows at all pregnancy stages. Other windows ranging from 1,730 to 1,764 cm-1, 1,775 to 1,992 cm-1, 1,995 to 2,163 cm-1, and 2,167 to 2,316 cm-1 had varying medium to high PPA (30% to 100%) across stages. The estimated PPA in wavenumber regions from 1,477 to 1,507 cm-1, and 1,510 to 1,574 cm-1 was weaker in stages 1 and 2 compared with later stages, whereas for the regions 2,984 to 3,077 cm-1 and 3,081 to 3,133 cm-1 the effect of pregnancy was greater for stage 1 compared with other stages. Despite our conclusion that milk FT-MIR data poorly diagnose PS, our study provides new insights into spectral regions that are strongly associated with PS and warrant greater attention.
Read full abstract