Abstract

As one of the necessary cash crops in China and many other countries, wolfberry is parasitized by multiple pests, and its yield is highly susceptible to being affected. On the other hand, agricultural pest backgrounds are complex. When identifying them, single-modal models cannot utilize diverse data types across modalities, resulting in low identification accuracy and data utilization. Traditional unimodal identification models can no longer meet the needs of multimodal data development in agriculture. To overcome these challenges, the ITF-WPI cross-modal feature fusion model is proposed, which consists of CoTN and ODLS for parallel processing of images and text, respectively. We incorporate the Transformer structure (CoT), which focuses on contextual feature extraction, into CoTN to make full use of the rich static and dynamic linear fusion contexts between adjacent keys and improve the 4-stage network of CoTN using Pyramid Squeezed Attention (PSA) to improve the extraction of multi-scale feature structure information and effectively promote the interaction of in-depth features with multi-scale spatial information. The ODLS network constructed by introducing 1D convolutional and bidirectional LSTM stacking has been shown to have more robust text feature acquisition than other advanced convolutional neural network-long short-term memory (CNN-LSTM) models from experimental results, with a 30% reduction in MACCs compared to the optimal model. The results showed that ITF-WPI performed well in accuracy, F1 score, model size, and MACCs with 97.98%, 93.19%, 52.20 MB, and 7.828 G compared to the classical state-of-the-art (SOTA) model, lightweight SOTA model and advanced Transformer neural network synthesis, respectively. The model has critical practical applications for promoting the development of cross-modal models in agriculture and research on wolfberry pest control and improving wolfberry yields. The code and dataset for this study will be posted on GitHub (https://github.com/wemindful/Cross-modal-pest-Identifying) as soon as the study is released, and new data will be updated in the future.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call