Simple SummaryThe recognition of objects in three dimensional (3D) data is challenging, especially when it comes to the partition of objects into predefined segments. In this study, two machine learning approaches were applied to recognize the body parts head, rump, back, legs and udder of dairy cows in 3D data recorded with Microsoft Kinect V1, for usage in automated exterior evaluation. Five properties of data points were used as features in order to train a k nearest neighbour classifier and a neural network to determine the body part on which the point was located. Both algorithms used for this feature-based approach are computationally faster and more flexible compared to a model-based object detection priorly used for the same purpose. Both methods could be considered successful for the determination of body parts via pixel properties. Reaching high to very high overall accuracies and very small Hamming losses, the k nearest neighbour classification is superior to the neural network which reached medium to high evaluation metrics. However, k nearest neighbour classification is at runtime prone to higher costs regarding computational time and memory, while once trained, the neural network delivers the classification results very quickly. This needs also to be taken into account in the decision of which method should be implemented.Machine learning methods have become increasingly important in animal science, and the success of an automated application using machine learning often depends on the right choice of method for the respective problem and data set. The recognition of objects in 3D data is still a widely studied topic and especially challenging when it comes to the partition of objects into predefined segments. In this study, two machine learning approaches were utilized for the recognition of body parts of dairy cows from 3D point clouds, i.e., sets of data points in space. The low cost off-the-shelf depth sensor Microsoft Kinect V1 has been used in various studies related to dairy cows. The 3D data were gathered from a multi-Kinect recording unit which was designed to record Holstein Friesian cows from both sides in free walking from three different camera positions. For the determination of the body parts head, rump, back, legs and udder, five properties of the pixels in the depth maps (row index, column index, depth value, variance, mean curvature) were used as features in the training data set. For each camera positions, a k nearest neighbour classifier and a neural network were trained and compared afterwards. Both methods showed small Hamming losses (between 0.007 and 0.027 for k nearest neighbour (kNN) classification and between 0.045 and 0.079 for neural networks) and could be considered successful regarding the classification of pixel to body parts. However, the kNN classifier was superior, reaching overall accuracies 0.888 to 0.976 varying with the camera position. Precision and recall values associated with individual body parts ranged from 0.84 to 1 and from 0.83 to 1, respectively. Once trained, kNN classification is at runtime prone to higher costs in terms of computational time and memory compared to the neural networks. The cost vs. accuracy ratio for each methodology needs to be taken into account in the decision of which method should be implemented in the application.
Read full abstract