Visual question answering model for fruit tree disease decision-making based on multimodal deep learning.

Yubin Lan,Yaqi Guo,Qizhen Chen,Yuntong Chen,Shaoming Lin,Xiaoling Deng

doi:10.3389/fpls.2022.1064399

Abstract

Visual Question Answering (VQA) about diseases is an essential feature of intelligent management in smart agriculture. Currently, research on fruit tree diseases using deep learning mainly uses single-source data information, such as visible images or spectral data, yielding classification and identification results that cannot be directly used in practical agricultural decision-making. In this study, a VQA model for fruit tree diseases based on multimodal feature fusion was designed. Fusing images and Q&A knowledge of disease management, the model obtains the decision-making answer by querying questions about fruit tree disease images to find relevant disease image regions. The main contributions of this study were as follows: (1) a multimodal bilinear factorized pooling model using Tucker decomposition was proposed to fuse the image features with question features: (2) a deep modular co-attention architecture was explored to simultaneously learn the image and question attention to obtain richer graphical features and interactivity. The experiments showed that the proposed unified model combining the bilinear model and co-attentive learning in a new network architecture obtained 86.36% accuracy in decision-making under the condition of limited data (8,450 images and 4,560k Q&A pairs of data), outperforming existing multimodal methods. The data augmentation is adopted on the training set to avoid overfitting. Ten runs of 10-fold cross-validation are used to report the unbiased performance. The proposed multimodal fusion model achieved friendly interaction and fine-grained identification and decision-making performance. Thus, the model can be widely deployed in intelligent agriculture.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in Plant Science	Publication Date: Jan 5, 2023
Citations: 5	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Visual question answering model for fruit tree disease decision-making based on multimodal deep learning.

Abstract

Talk to us

Similar Papers

More From: Frontiers in Plant Science

Lead the way for us

Similar Papers

Multi-modal Feature Fusion Based on Variational Autoencoder for Visual Question Answering
Liqing Chen ... Yilei Wang
-
Liqing Chen, et. al.Liqing Chen ... Yilei Wang
01 Jan 2019
01 Jan 2019

Visual Question Answering as Reading Comprehension
Hui Li ... Anton Van Den Hengel
-
Hui Li, et. al.Hui Li ... Anton Van Den Hengel
01 Jun 2019
01 Jun 2019

Beyond Bilinear: Generalized Multimodal Factorized High-Order Pooling for Visual Question Answering.
Zhou Yu ... Jun Yu
IEEE Transactions on Neural Networks and Learning Systems | VOL. 29
Zhou Yu, et. al.Zhou Yu ... Jun Yu
09 Apr 2018
IEEE Transactions on Neural Networks and Learning Systems | VOL. 29

Multimodal feature fusion and exploitation with dual learning and reinforcement learning for recipe generation
Mengyang Zhang ... Ying Zhang
Applied Soft Computing | VOL. 126
Mengyang Zhang, et. al.Mengyang Zhang ... Ying Zhang
09 Jul 2022
Applied Soft Computing | VOL. 126

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Visual question answering model for fruit tree disease decision-making based on multimodal deep learning.

Abstract

Talk to us

Similar Papers

More From: Frontiers in Plant Science