Accurate detection of capsicum in thermal images plays a pivotal role in greenhouse optimization and agricultural efficiency. However, it poses distinctive challenges stemming from subtle temperature variations and the imperative for precise localization. This study presents a comprehensive approach to thermal capsicum detection in greenhouse environments, addressing significant research gaps through the construction of a meticulously curated dataset comprising 300 thermal images. These images, capturing various occlusion levels, underwent extensive preprocessing and augmentation to enhance diversity and quality. The custom-trained single-shot YOLOv10 model demonstrated impressive performance metrics, achieving precision, recall, F1-score, and mean Average Precision (mAP) of 0.979, 0.712, 0.824, and 0.883, respectively, with a detection speed of 53.92 FPS and the highest confidence score of 91 %. In contrast, the zero-shot Grounding DINO model, hypertuned for capsicum detection in thermal scenes, achieved a maximum confidence score of 60 %, leading to the proposal of a novel SAM-DINO model. SAM-DINO, combining the Segment Anything Model (SAM) for pixel-level segmentation and Grounding self-distillation with no labels (Grounding DINO) for detection, exhibited enhanced confidence scores of 65 %. Comparative testing of YOLOv10 and SAM-DINO on 50 test images revealed YOLOv10′s superiority with precision, recall, and F1-scores of 0.95, 0.87, and 0.91, respectively, compared to SAM-DINO's scores of 0.90, 0.72, and 0.80. Additionally, YOLOv10 demonstrated an inference speed of 19 ms, approximately ten times faster than SAM-DINO. These findings underscore the efficacy of both custom-trained and zero-shot detection models in thermal capsicum detection, offering valuable insights into their respective strengths and limitations in greenhouse agricultural applications.
Read full abstract