Abstract

The government and farmers' organizations are paying much attention to the problem of predicting how much rice can be produced each year. However, it is difficult to accurately predict the yield of rice due to variable factors such as extreme climate change and various pests and diseases that change every year. In this study, images were collected several times during the growing season of rice through a multi-spectral sensor mounted on an unmanned aerial vehicle, and rice yield was predicted using a deep learning algorithm. Multispectral images can be viewed as a kind of image data taken several times at regular intervals, and rice yield was predicted using the Video Vision Transformer (ViViT) model, which applies the Transformer structure to image computer vision among deep learning algorithms. The ViViT model generates patches by dividing the input image into a certain size, and as a result of learning the model by setting the size of these patches differently, it was found that the smaller the patch size, the better the predictive power. In addition, as a result of comparing prediction performance with a 3D CNN model that receives an image as an input in a CNN (Convolutional Neural Network) structure used in the image processing field, it was found that the ViViT model using a small patch size performed better. As a result of visualizing the weight matrix of the ViViT model as a heat map, images taken in mid- to late August appear to be important in yield prediction, making it possible to predict yield about two months before rice harvest.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call