Disentangling data dependency using cross-validation strategies to evaluate prediction quality of cattle grazing activities using machine learning algorithms and wearable sensor data.

Leonardo Augusto Coelho Ribeiro,Guilherme Jordão De Magalhães Rosa,Tiago Bresolin,João Ricardo Rebouças Dórea,Daniel Rume Casagrande,Marina De Arruda Camargo Danes

doi:10.1093/jas/skab206

Leonardo Augusto Coelho Ribeiro, Guilherme Jordão De Magalhães Rosa + Show 4 more

Open Access

https://doi.org/10.1093/jas/skab206

Copy DOI

Abstract

Wearable sensors have been explored as an alternative for real-time monitoring of cattle feeding behavior in grazing systems. To evaluate the performance of predictive models such as machine learning (ML) techniques, data cross-validation (CV) approaches are often employed. However, due to data dependencies and confounding effects, poorly performed validation strategies may significantly inflate the prediction quality. In this context, our objective was to evaluate the effect of different CV strategies on the prediction of grazing activities in cattle using wearable sensor (accelerometer) data and ML algorithms. Six Nellore bulls (average live weight of 345 ± 21 kg) had their behavior visually classified as grazing or not-grazing for a period of 15 d. Elastic Net Generalized Linear Model (GLM), Random Forest (RF), and Artificial Neural Network (ANN) were employed to predict grazing activity (grazing or not-grazing) using 3-axis accelerometer data. For each analytical method, three CV strategies were evaluated: holdout, leave-one-animal-out (LOAO), and leave-one-day-out (LODO). Algorithms were trained using similar dataset sizes (holdout: n = 57,862; LOAO: n = 56,786; LODO: n = 56,672). Overall, GLM delivered the worst prediction accuracy (53%) compared with the ML techniques (65% for both RF and ANN), and ANN performed slightly better than RF for LOAO (73%) and LODO (64%) across CV strategies. The holdout yielded the highest nominal accuracy values for all three ML approaches (GLM: 59%, RF: 76%, and ANN: 74%), followed by LODO (GLM: 49%, RF: 61%, and ANN: 63%) and LOAO (GLM: 52%, RF: 57%, and ANN: 57%). With a larger dataset (i.e., more animals and grazing management scenarios), it is expected that accuracy could be increased. Most importantly, the greater prediction accuracy observed for holdout CV may simply indicate a lack of data independence and the presence of carry-over effects from animals and grazing management. Our results suggest that generalizing predictive models to unknown (not used for training) animals or grazing management may incur poor prediction quality. The results highlight the need for using management knowledge to define the validation strategy that is closer to the real-life situation, i.e., the intended application of the predictive model.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Disentangling data dependency using cross-validation strategies to evaluate prediction quality of cattle grazing activities using machine learning algorithms and wearable sensor data.

Abstract

Talk to us

Similar Papers

More From: Journal of animal science

Lead the way for us

Journal: Journal of animal science	Publication Date: Jul 5, 2021
Citations: 9

Similar Papers

PSXI-22 Prediction quality of cattle behavior traits evaluated through different cross-validation strategies using wearable sensor data and machine learning algorithms
Leonardo Augusto Coelho Ribeiro ... Tiago Bresolin
Journal of Animal Science | VOL. 98
Leonardo Augusto Coelho Ribeiro, et. al.Leonardo Augusto Coelho Ribeiro ... Tiago Bresolin
30 Nov 2020
Journal of Animal Science | VOL. 98

Enhancing Large-Diameter Tunnel Construction Safety with Robust Optimization and Machine Learning Integrated into BIM
Jagendra Singh ... Sandeep Kumar
The Open Civil Engineering Journal | VOL. 18
Jagendra Singh, et. al.Jagendra Singh ... Sandeep Kumar
07 Oct 2024
The Open Civil Engineering Journal | VOL. 18

Machine learning and generalized linear model techniques to predict aboveground biomass in Amazon rainforest using LiDAR data
Mateus Schuh ... Elias Fernando Berra
Journal of Applied Remote Sensing | VOL. 14
Mateus Schuh, et. al.Mateus Schuh ... Elias Fernando Berra
02 Sep 2020
Journal of Applied Remote Sensing | VOL. 14

Prediction of oil and gas pipeline failures through machine learning approaches: A systematic review
Abdulnaser M Al-Sabaeei ... Ajayshankar Jagadeesh
Energy Reports | VOL. 10
Abdulnaser M Al-Sabaeei, et. al.Abdulnaser M Al-Sabaeei ... Ajayshankar Jagadeesh
16 Aug 2023
Energy Reports | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Disentangling data dependency using cross-validation strategies to evaluate prediction quality of cattle grazing activities using machine learning algorithms and wearable sensor data.

Abstract

Talk to us

Similar Papers

More From: Journal of animal science