Abstract

Typical dairy cow lameness actions are only several seconds in duration and exhibit characteristic spatiotemporal structures. Manual lameness detection poses several problems, such as difficulty in the detection of sudden, severe or early lameness behavior. In this paper, our proposed method attempts to capture this structure and to learn action representations with convolutional neural networks. However, such representations are typically learned at the level of a few video frames, thereby failing to model actions over their full temporal extent. In this study, we learn video representations using neural networks with single-stream long-term optical flow convolution. To evaluate the performance of the algorithm, 756 videos were randomly selected from 1080 dairy cow videos as the original training videos, and the remaining videos were used as test videos. The experimental results demonstrate that single-stream long-term optical flow convolution network models with increased temporal extents improve the accuracy of dairy cow lameness action recognition. We also study the impacts of various low-level representations, such as raw values of video pixels and optical flow vector fields, and we demonstrate the importance of high-quality optical flow estimation for learning accurate dairy cow lameness action models. We report state-of-the-art results on challenging benchmarks for the dairy cow lameness action video set (98.24%).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call