Traditionally height of end effector of pod pepper harvester is fixed, which induces it hardly adapt to growth height of clustered peppers. Firstly, aiming at the problems of small size and clustered growth of pepper fruits during identification task, an improved Faster R-CNN algorithm is proposed. On the one hand, strategies such as increasing the types and number of high-resolution anchors and using RoI Align instead of RoI Pooling are employed to improve the detection accuracy for tiny targets. On the other hand, ResNet+FPN instead of VGG16 and ResNet backbone structure is adopted as the low-level feature extractor, so extracting capability for small features can be enhanced effectively. Furthermore, to precisely locate the position of clustered peppers, a height calculation model combining the 2D image recognition results with its depth information is advanced. Comparative experiments show that the overall accuracy <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">AP</i> and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">AP</i> <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">50</sub> of our method reach 75.79% and 87.30%, respectively. Compared with VGG16 feature extraction model, the two indicators are improved by 8.7% and 1.3%, respectively. The small target detection accuracy <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">AP</i> <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">small</sup> is increased about 11.4%, with recall rate <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">AR</i> <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">small</sup> increased up to 10.2%. The overall loss rate <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Loss</i> is reduced by 4.7%, which manifests greatly improvement compared to YOLOv3 model. The detection time of a single frame reaches 42 <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">ms</i> , which is slightly longer than that of YOLOv3 network, but it can still meet the real-time detection requirements of pepper harvester. In 3D location experiment, the average absolute height error of clustered peppers from the ground is 4.4 <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">mm</i> , that accounts to the relative average error of 1.1%, thus suffices the adjustment error requirement of the end effector.
Read full abstract