Machine Learning in the Classification of Soybean Genotypes for Primary Macronutrients’ Content Using UAV–Multispectral Sensor

Dthenifer Cordeiro Santana,Larissa Pereira Ribeiro Teodoro,Fábio Henrique Rojo Baio,Luciano Shozo Shiratsuchi,Paulo Henrique Menezes Das Chagas,Marcelo Carvalho Minhoto Teixeira Filho,Carlos Antonio Da Silva Junior,Cid Naudi Silva Campos,Paulo Eduardo Teodoro,João Lucas Gouveia De Oliveira,Marcelo Rinaldi Da Silva

doi:10.3390/rs15051457

Abstract

Using spectral data to quantify nitrogen (N), phosphorus (P), and potassium (K) contents in soybean plants can help breeding programs develop fertilizer-efficient genotypes. Employing machine learning (ML) techniques to classify these genotypes according to their nutritional content makes the analyses performed in the programs even faster and more reliable. Thus, the objective of this study was to find the best ML algorithm(s) and input configurations in the classification of soybean genotypes for higher N, P, and K leaf contents. A total of 103 F2 soybean populations were evaluated in a randomized block design with two repetitions. At 60 days after emergence (DAE), spectral images were collected using a Sensefly eBee RTK fixed-wing remotely piloted aircraft (RPA) with autonomous take-off, flight plan, and landing control. The eBee was equipped with the Parrot Sequoia multispectral sensor. Reflectance values were obtained in the following spectral bands (SBs): red (660 nm), green (550 nm), NIR (735 nm), and red-edge (790 nm), which were used to calculate the vegetation index (VIs): normalized difference vegetation index (NDVI), normalized difference red edge (NDRE), green normalized difference vegetation index (GNDVI), soil-adjusted vegetation index (SAVI), modified soil-adjusted vegetation index (MSAVI), modified chlorophyll absorption in reflectance index (MCARI), enhanced vegetation index (EVI), and simplified canopy chlorophyll content index (SCCCI). At the same time of the flight, leaves were collected in each experimental unit to obtain the leaf contents of N, P, and K. The data were submitted to a Pearson correlation analysis. Subsequently, a principal component analysis was performed together with the k-means algorithm to define two clusters: one whose genotypes have high leaf contents and another whose genotypes have low leaf contents. Boxplots were generated for each cluster according to the content of each nutrient within the groups formed, seeking to identify which set of genotypes has higher nutrient contents. Afterward, the data were submitted to machine learning analysis using the following algorithms: decision tree algorithms J48 and REPTree, random forest (RF), artificial neural network (ANN), support vector machine (SVM), and logistic regression (LR, used as control). The clusters were used as output variables of the classification models used. The spectral data were used as input variables for the models, and three different configurations were tested: using SB only, using VIs only, and using SBs+VIs. The J48 and SVM algorithms had the best performance in classifying soybean genotypes. The best input configuration for the algorithms was using the spectral bands as input.

Full Text