Abstract
Field-scale crop yield prediction is critical to site-specific field management, which has been facilitated by recent studies fusing unmanned aerial vehicles (UAVs) based multimodal data. However, these studies equivalently stacked multimodal data and underused canopy spatial information. In this study, multimodal imagery fusion (MIF) attention was proposed to dynamically fuse UAV-based RGB, hyperspectral near-infrared (HNIR), and thermal imagery. Based on the MIF attention, a novel model termed MultimodalNet was proposed for field-scale yield prediction of winter wheat. To compare multimodal imagery-based and multimodal features-based methods, a stacking-based ensemble learning model was built using UAV-based canopy spectral, thermal, and texture features. The results showed that the MultimodalNet achieved accurate results at the reproductive stage and performed better than any single modality in the fusion. The MultimodalNet performed best at the flowering stage, with a coefficient of determination of 0.7411 and a mean absolute percentage error of 6.05%. The HNIR and thermal imagery were essential in yield prediction of winter wheat at the reproductive stage. Compared to equivalent stacking fusion, dynamic fusion through adaptively adjusting modality attention improved the model accuracy and adaptability across winter wheat cultivars and water treatments. Equivalently stacking more modalities did not necessarily yield improved performance than dynamically fusing fewer modalities. Methods using multimodal UAV imagery with rich spatial information were more applicable than methods using multimodal features to field-scale yield prediction. This study indicates that the MultimodalNet makes a powerful tool for field-scale yield prediction of winter wheat.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have