Assessing optimal frequency for image acquisition in computer vision systems developed to monitor feeding behavior of group-housed Holstein heifers.

T Bresolin,R Ferreira,J.R.R Dórea,J Van Os,F Reyes

doi:10.3168/jds.2022-22138

Abstract

Computer vision systems have emerged as a potential tool to monitor the behavior of livestock animals. Such high-throughput systems can generate massive redundant data sets for training and inference, which can lead to higher computational and economic costs. The objectives of this study were (1) to develop a computer vision system to individually monitor detailed feeding behaviors of group-housed dairy heifers, and (2) to determine the optimal frequency of image acquisition to perform inference with minimal effect on feeding behavior prediction quality. Eight Holstein heifers (96 ± 6 d old) were housed in a group and a total of 25,214 images (1 image every second) were acquired using 1 RGB camera. A total of 2,209 images were selected and each animal in the image was labeled with its respective identification (1-8). The label was annotated only on animals that were at the feed bunk (head through the feed rail). From the labeled images, 1,392 were randomly selected to train a deep learning algorithm for object detection with YOLOv3 ("You Only Look Once" version 3) and 154 images were used for validation. An independent data set (testing set = 663 out of the 2,209 images) was used to test the algorithm. The average accuracy for identifying individual animals in the testing set was 96.0%, and for each individual heifer from 1 to 8 the accuracy was 99.2, 99.6, 99.2, 99.6, 99.6, 99.2, 99.4, and 99.6%, respectively. After identifying the animals at the feed bunk, we computed the following feeding behavior parameters: number of visits (NV), mean visit duration (MVD), mean interval between visits (MIBV), and feeding time (FT) for each heifer using a data set composed by 8,883 sequential images (1 image every second) from 4 time points. The coefficient of determination (R2) was 0.39, 0.78, 0.48, and 0.99, and the root mean square error (RMSE) were 12.3 (count), 0.78, 0.63, and 0.31 min for NV, MVD, MIBV, and FT, respectively, considering 1 image every second. When we moved from 1 image per second to 1 image every 5 (MIBV) or 10 (NV, MDV, and FT) s, the R2 observed were 0.55 (NV), 0.74 (MVD), 0.70 (MIBV), and 0.99 (FT); and the RMSE were 2.27 (NV, count), 0.38 min (MVD), 0.22 min (MIBV), and 0.44 min (FT). Our results indicate that computer vision systems can be used to individually identify group-housed Holstein heifers (overall accuracy = 99.4%). Based on individual identification, feeding behavior such as MVD, MIBV, and FT can be monitored with reasonable accuracy and precision. Regardless of the frequency for optimal image acquisition, our results suggested that longer time intervals of image acquisition would reduce data collecting and model inference while maintaining adequate predictive performance. However, we did not find an optimal time interval for all feeding behavior; instead, the optimal frequency of image acquisition is phenotype-specific. Overall, the best R2 and RMSE for NV, MDV, and FT were achieved using 1 image every 10 s, and for MIBV it was achieved using 1 image every 5 s, and in both cases model inference and data storage could be drastically reduced.

Full Text