Abstract

This paper aims to evaluate the feasibility and performance of two related applications in agricultural big data: hyperspectral imaging and parallel computing. After selecting two oilseed rape varieties (NingYou 22 and NingZa 19) as the objects of study, we captured hyperspectral images of these siliques following their exposure to three different waterlogging stress levels (0, 3, and 6 days). The machine learning library for Spark was used to realize both artificial neural network (ANN) and support vector machine (SVM) classification algorithms, and to conduct hyperspectral classification analysis of the oilseed rape siliques under the various levels of waterlogging stress on the parallel computing platform. From the classification data sets, 70% of the data were randomly selected for training, and the remaining 30% were used for prediction. The experimental results indicate that, when the hyperspectral image of a region of interest (400–1000 nm) was extracted and combined with the spectrum image data, the oilseed rape waterlogging detection model based on the Spark parallel computing framework was feasible and efficient. For the multi-class classification problem, the accuracy of the ANN algorithm was superior to that of the SVM, but its convergence time exceeded that of the SVM algorithm. Using the ANN and SVM algorithms for binary classification of the samples from both varieties, the results indicate that the performance of the SVM algorithm was superior in terms of the binary classification problem. Meanwhile, of the two oilseed rape varieties, the NZ 19 waterlogging samples yielded better classification results. Five optimal wavebands (512, 621, 689, 953, and 961 nm) were selected as the inputs for the classification algorithm. The results show that the classification accuracy of full-waveband hyperspectral imaging was slightly higher than that of optimal waveband imaging, while the ANN algorithm was more accurate than the SVM algorithm. Finally, the three indices of speedup, scaleup and sizeup were used to evaluate the operation performance of the hyperspectral data set algorithm based on the Spark parallel computing platform.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call