GPT-aided diagnosis on agricultural image based on a new light YOLOPC

Jiajun Qing,Xiaoling Deng,Yubin Lan,Zhikai Li

doi:10.1016/j.compag.2023.108168

Abstract

Large Language Models (LLM) have been extensively studied for their ability to engage in textual dialogue and have shown promising results in various fields. However, the agricultural industry has yet to fully integrate LLM into its practice due to the dominance of visual images in agricultural data that cannot be effectively processed by LLM designed for text. Additionally, traditional image classification networks have limitations in understanding crop etiology and disease, hindering accurate diagnosis. Furthermore, the mixture of diseases can also interfere with the network's prediction. Therefore, accurately analyzing pests and diseases in agricultural scenarios and providing diagnostic reports remains a challenge. To address this issue, a novel approach that combines the deep logical reasoning capabilities of GPT-4 with the visual understanding capabilities of the YOLO (You Only Look Once) network was proposed in this study. Additionally, a new lightweight variant of YOLO, called YOLOPC, and a novel image-to-text mapping method for adapting YOLO and GPT were introduced. The experimental results demonstrate that YOLOPC, with approximately 75% fewer parameters than YOLOv5-nano, achieves a 94.5% accuracy rate. The GPT induction and reasoning module demonstrates 90% reasoning accuracy in generating agricultural diagnostic reports with text assistance. In the future, it is likely that a higher-performance GPT model will be released. The combination of GPT with agricultural scenarios will become the cornerstone of large-scale agricultural diagnostic models. The proposed method will benefit the development of large-scale models in the agricultural field.

Full Text