Abstract

• Practitioner guidelines for the number of fruit to label for deep learning. • Predict AP score prior to training by calculating a novel similarity score. • Study of relationships between data-centric attributes and performance. • Evaluation on open-source datasets and a modern deep learning method (YOLOv5). Due to the rapid development of deep learning, object detection models have become the current tool of choice for on-tree fruit detection in precision agriculture. The pipeline of fruit detection based on deep learning generally starts from custom dataset collection, then image annotation, then training the object detection model, and finally determination of its accuracy and running the trained model for applications. To achieve better performance in fruit detection, most research has been focused on the third part of the pipeline which is improving or adjusting the state-of-art object detection models. However, the first two data-centric parts of the pipeline also require more investigation. For example, there is very limited research about how many annotations are sufficient for training and the degree of influence of image quality on the training performance for single-class fruit detection. Therefore, in this study, we thoroughly analysed seven public on-tree fruit datasets that cover apples, almonds, mangoes, and grape bunches under different image conditions. Our experiment for testing the size of the training dataset indicates that 2500 annotated objects are generally sufficient for single-class fruit training and our experiment for testing the object size shows simply that objects of a larger size have the potential to achieve better accuracy. Then a novel similarity score was proposed to allow the readers to easily estimate the expected Average Precision (AP) score without doing any training. The last two data-centric experiments then indicate that the influence of blurriness on the training accuracy is minor whereas less complex objects show the possibility of achieving better accuracy. Overall, such numerical data-centric analysis of on-tree fruit detection will enable us to better understand the influence on the training accuracy from data-centric attributes, which is of great benefit in helping practitioners prepare better quality datasets from a data-centric perspective and thus achieve higher training accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call