Monitoring body condition score (BCS) is a useful management tool to estimate the energy reserves of an individual cow or a group of cows. The aim of this study was to develop and evaluate the performance of a fully automated 2-dimensional imaging system using a machine learning algorithm to generate real-time BCS for dairy cows. Two separate datasets were used for training and testing. The training dataset included 34,150 manual BCS (MAN_BCS) assigned by 5 experienced veterinarians during 35 visits at 7 dairy farms. Ordinal regression methods and deep learning architecture were used when developing the algorithm. Subsequently, the testing dataset was used to evaluate the developed BCS prediction algorithm on 4 of the participating farms. An experienced human assessor (HA1) visited these farms and performed 8 whole-milking-herd BCS sessions. Each farm was visited twice, allowing for 30 d (±2 d) to pass between visits. The MAN_BCS assigned by HA1 were considered the ground truth data. At the end of the validation study, MAN_BCS were merged with the stored automated BCS (AI_BCS), resulting in a testing dataset of 9,657 single BCS. A total of 3,817 cows in the testing dataset were scored twice 30 d (±2 d) apart, and the change in their BCS (ΔBCS) was calculated. A subset of cows at one farm were scored twice on consecutive days to evaluate the within-observer agreement of both the human assessor and the system. The manual BCS of 2 more assessors (HA2 and HA3) were used to assess the interobserver agreement between humans. Finally, we also collected ultrasound measurements of backfat thickness (BFT) from 111 randomly selected cows with available MAN_BCS and AI_BCS. Using the testing dataset, intra- and interobserver agreement for single BCS and ΔBCS were estimated by calculating the simple percentage agreement (PA) at 3 error levels and the weighted kappa (κw) for the exact agreement. A Bland-Altman plot was constructed to visualize the systematic and proportional bias. The association between MAN_BCS and AI_BCS and the BFT was assessed with Passing-Bablok regressions. The system had an almost perfect repeatability with a κw of 0.99. The agreement between MAN_BCS and AI_BCS was substantial, with an overall κw of 0.69. The overall PA at the exact, ± 0.25-unit, and ± 0.50-unit BCS error range between MAN_BCS and AI_BCS was 44.4%, 84.6%, and 94.8%, respectively, and greater than the PA obtained between HA1 and HA3. The Bland-Altman plot revealed a minimal systematic bias of -0.09 with a proportional bias at the extreme scores. Furthermore, despite the low κw of 0.20, the overall PA at the exact and ± 0.25-unit of BCS error range between MAN_BCS and AI_BCS regarding the ΔBCS was 45.7 and 88.2%, respectively. A strong linear relationship was observed between BFT and AI_BCS (ρ = 0.75), although weaker than that between BFT and MAN_BCS (ρ = 0.91). The system was able to predict single BCS and ΔBCS with satisfactory accuracy, comparable to that obtained between trained human scorers.
Read full abstract