Knowledge-Infused Learning for Fine-Grained Plant Disease Recognition
Domain knowledge exists in various forms, including text, ontologies, graphs, images, audio, and videos. In plant disease detection, most works solely utilize images with disease labels, neglecting textual descriptions of visual disease symptoms used by human experts for diagnosis. These text descriptions and sample images aid expert identification of visual symptoms. We propose a novel method that leverages text descriptions and image data by modeling domain-specific knowledge about visual symptoms in leaf images as separate feature channels. Each channel corresponds to specific features whose absence or presence in the image influences model predictions. We introduce a channel attention-guided fusion module for weighting each channel based on the input and corresponding output. The combined feature channels are transformed into a standardized 3-channel input format, which can then be processed by any pre-trained convolutional neural network (CNN) as input for feature extraction and subsequent classification. Furthermore, intermediate activations of the channel attention layer combined with the weights from the fusion layer make model predictions explainable. Experimental results on three publicly available datasets of apple and cucumber leaf diseases demonstrate improvements of up to 5% utilizing various state-of-the-art CNN architectures, indicating the efficacy of incorporating textual disease descriptions using the proposed approach.