Abstract

Autonomous harvesting shows a promising prospect in the future development of the agriculture industry, while the vision system is one of the most challenging components in the autonomous harvesting technologies. This work proposes a multi-function network to perform the real-time detection and semantic segmentation of apples and branches in orchard environments by using the visual sensor. The developed detection and segmentation network utilises the atrous spatial pyramid pooling and the gate feature pyramid network to enhance feature extraction ability of the network. To improve the real-time computation performance of the network model, a lightweight backbone network based on the residual network architecture is developed. From the experimental results, the detection and segmentation network with ResNet-101 backbone outperformed on the detection and segmentation tasks, achieving an score of 0.832 on the detection of apples and 87.6% and 77.2% on the semantic segmentation of apples and branches, respectively. The network model with lightweight backbone showed the best computation efficiency in the results. It achieved an score of 0.827 on the detection of apples and 86.5% and 75.7% on the segmentation of apples and branches, respectively. The weights size and computation time of the network model with lightweight backbone were 12.8 M and 32 ms, respectively. The experimental results show that the detection and segmentation network can effectively perform the real-time detection and segmentation of apples and branches in orchards.

Highlights

  • Apple harvesting is a labour-intensive, time-consuming, and costly task

  • A batch-normalisation layer is added after the gate as our experiment shows that the batch-normalisation layer can improve the performance of the network model

  • The Gated Feature Pyramid Network (GFPN) in the Detection and Segmentation Network (DaSNet) allows the selective representation of the feature between different levels, which can minimise the spatial shift of the feature maps from different levels and balance the backward propagated gradients

Read more

Summary

Introduction

Apple harvesting is a labour-intensive, time-consuming, and costly task. The ageing population and cost of the human resources has led to a decreasing of available labour force for the agriculture harvesting [1]. To increase the success rate and reduce the damage rate of automatic fruit harvesting, information of the fruit pose [3] and stem–branch joint location and orientation [4] are required. This demands the robotic vision system to accurately and robustly extract the geometry and semantic information from the working scene in the orchard environment [5]. A multi-function Deep Convolution Neural Network (DCNN) is developed to perform the real-time detection and semantic segmentation of apples and branches in orchards.

Related Work
Vision Sensing System
Gated-FPN for Multi-level Fusion
ASPP for Multi-Scale Fusion
Lightweight Designed Backbone
Data Collection
Training Method
Experiment on Network Architectures
Experiment Results and Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call