Synthetic meets authentic: Leveraging LLM generated datasets for YOLO11 and YOLOv10-based apple detection through machine vision sensors

Ranjan Sapkota,Zhichao Meng,Manoj Karkee

doi:10.1016/j.atech.2024.100614

Abstract

Training machine learning (ML) models for artificial intelligence (AI) and computer vision-based object detection process typically requires large, labeled datasets, a process often burdened by significant human effort and high costs associated with imaging systems and image acquisition. This research aimed to simplify image data collection for object detection in orchards by avoiding traditional fieldwork with different imaging sensors. Utilizing OpenAI's DALLE, a large language model (LLM) for realistic image generation, we generated and annotated a cost-effective dataset. This dataset, exclusively generated by LLM, was then utilized to train two state-of-the-art deep learning models: YOLOV10 and YOLO11. The YOLO11 model for apple detection was trained with its five configurations (YOLO11n, YOLO11 s, YOLO11 m, YOLO11l and YOLO11x), and YOLOv10 model with its six configurations (YOLOv10n, YOLOv10 s, YOLOv10 m, YOLOv10b, YOLOv10l and YOLOv10x), which was then tested with real-world (outdoor orchard) images captured by a digital (Nikon D5100) camera and a consumer RGB-D camera (Microsoft Azure Kinect). YOLO11 outperformed YOLOv10 as YOLO11x and YOLO11n exhibited superior precision of 0.917 and 0.916, respectively. Furthermore, YOLO11l demonstrated the highest recall among its counterparts, achieving a recall of 0.889. Likewise, the YOLO11n variant excelled in terms of mean average precision (mAP@50), achieving the highest value of 0.958. Validation tests against actual images collected through a digital camera (Nikon D5100) over Scilate apple variety in a commercial orchard environment showed a highest precision of 0.874 for YOLO11 s, recall of 0.877 for YOLO11l and mAP@50 of 0.91 for YOLO11x. Additionally, validation test against actual images collected through a Microsoft Azure camera over the same orchard showed a highest precision, recall and mAP@50 respectively of 0.924, 0.781 and 0.855 with YOLO11x. All variants of YOLO11 surprisingly demonstrated a pre-processing time of just 0.2 milliseconds (ms), which was faster than any variant of YOLOv10. The fastest inference time for the YOLO11n model using the training dataset generated by the language model was 3.2 ms, while YOLOv10n, fastest among YOLOv10 variants, had a longer inference time of 5.5 ms. Likewise, the fastest inference time for the sensor-based images was 7.1 ms (for Nikon D5100 camera images) and 4.7 ms (for Azure images) with YOLO11n. This study presents a pathway for generating large image datasets using LLM in challenging agricultural fields with minimal or no labor-intensive efforts in field data-collection, which could accelerate the development and deployment of computer vision and robotic technologies in orchard environments.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Synthetic meets authentic: Leveraging LLM generated datasets for YOLO11 and YOLOv10-based apple detection through machine vision sensors

Abstract

Talk to us

Similar Papers

More From: Smart Agricultural Technology

Lead the way for us

Similar Papers

Large Language Models Can Enable Inductive Thematic Analysis of a Social Media Corpus in a Single Prompt: Human Validation Study.
Michael S Deiner ... Urmimala Sarkar
JMIR infodemiology | VOL. 4
Michael S Deiner, et. al.Michael S Deiner ... Urmimala Sarkar
29 Aug 2024
JMIR infodemiology | VOL. 4

The Accuracy and Capability of Artificial Intelligence Solutions in Health Care Examinations and Certificates: Systematic Review and Meta-Analysis.
William Joel Waldock ... Hutan Ashrafian
Journal of medical Internet research | VOL. 26
William Joel Waldock, et. al.William Joel Waldock ... Hutan Ashrafian
05 Nov 2024
Journal of medical Internet research | VOL. 26

Evaluation of the Performance of Generative AI Large Language Models ChatGPT, Google Bard, and Microsoft Bing Chat in Supporting Evidence-Based Dentistry: Comparative Mixed Methods Study.
Kostis Giannakopoulos ... Vassilis Stamatopoulos
Journal of Medical Internet Research | VOL. 25
Kostis Giannakopoulos, et. al.Kostis Giannakopoulos ... Vassilis Stamatopoulos
28 Dec 2023
Journal of Medical Internet Research | VOL. 25

Utilizing large language models for EFL essay grading: An examination of reliability and validity in rubric‐based assessments
Fatih Yavuz ... Özgür Çelik
British Journal of Educational Technology | VOL. -
Fatih Yavuz, et. al.Fatih Yavuz ... Özgür Çelik
04 Jun 2024
British Journal of Educational Technology | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Synthetic meets authentic: Leveraging LLM generated datasets for YOLO11 and YOLOv10-based apple detection through machine vision sensors

Abstract

Talk to us

Similar Papers

More From: Smart Agricultural Technology