Automatic recognition of lactating sow postures by refined two-stream RGB-D faster R-CNN

Xunmu Zhu,Changxin Chen,Bin Zheng,Xiaofan Yang,Haiming Gan,Chan Zheng,Aqing Yang,Liang Mao,Yueju Xue

doi:10.1016/j.biosystemseng.2019.11.013

Abstract

This paper proposes an end-to-end refined two-stream RGB-D Faster region convolutional neural network (R-CNN) algorithm, which fuses RGB-D image features in the feature extraction stage for recognising five postures of lactating sows (standing, sitting, sternal recumbency, ventral recumbency, and lateral recumbency) in scenes at a pig farm. Based on the Faster R-CNN algorithm, two CNNs were first used to extract the RGB image features and depth image features. Then, a proposed single RGB-D region proposal network was used to generate the regions of interest (ROIs) for the two types of image feature maps in RGB-D. Next, the features of the RGB-D ROIs were extracted and merged using a feature fusion layer. Finally, the fused features of the RGB-D ROIs were input into a Fast R-CNN to obtain the recognition results. A total of 12,600 pairs of RGB-D images of five postures were obtained by a Kinect v2.0 sensor and were randomly selected from the first 21 of 28 pens as the training set, and 5533 pairs were randomly selected from the remaining 7 pens as the test set. The proposed method was used to recognise the five postures of lactating sows. The recognition accuracy of the concatenation fusion method was the highest for the test set with average precisions for the five categories of lactating sow postures of 99.74%, 96.49%, 90.77%, 90.91%, and 99.45%, respectively. Compared with related methods (RGB-only method, depth-only method, RGB-D early fusion, and later fusion), our method attained the highest mean average precision.

Full Text