Corn grain moisture (CGM) is critical to estimate grain maturity status and schedule harvest. Traditional methods for determining CGM range from manual scouting, destructive laboratory analyses, and weather-based dry down estimates. Such methods are either time consuming, expensive, spatially inaccurate, or subjective, therefore they are prone to errors or limitations. Realizing that precision harvest management could be critical for extracting the maximum crop value, this study evaluates the estimation of CGM at a pre-harvest stage using high-resolution (1.3 cm/pixel) multispectral imagery and machine learning techniques. Aerial imagery data were collected in the 2022 cropping season over 116 experimental corn planted plots. A total of 24 vegetation indices (VIs) were derived from imagery data along with reflectance (REF) information in the blue, green, red, red-edge, and near-infrared imaging spectrum that was initially evaluated for inter-correlations as well as subject to principal component analysis (PCA). VIs including the Green Normalized Difference Index (GNDVI), Green Chlorophyll Index (GCI), Infrared Percentage Vegetation Index (IPVI), Simple Ratio Index (SR), Normalized Difference Red-Edge Index (NDRE), and Visible Atmospherically Resistant Index (VARI) had the highest correlations with CGM (r: 0.68–0.80). Next, two state-of-the-art statistical and four machine learning (ML) models (Stepwise Linear Regression (SLR), Partial Least Squares Regression (PLSR), Artificial Neural Network (ANN), Support Vector Machine (SVM), Random Forest (RF), and K-nearest neighbor (KNN)), and their 120 derivates (six ML models × two input groups (REFs and REFs+VIs) × 10 train–test data split ratios (starting 50:50)) were formulated and evaluated for CGM estimation. The CGM estimation accuracy was impacted by the ML model and train-test data split ratio. However, the impact was not significant for the input groups. For validation over the train and entire dataset, RF performed the best at a 95:5 split ratio, and REFs+VIs as the input variables (rtrain: 0.97, rRMSEtrain: 1.17%, rentire: 0.95, rRMSEentire: 1.37%). However, when validated for the test dataset, an increase in the train–test split ratio decreased the performances of the other ML models where SVM performed the best at a 50:50 split ratio (r = 0.70, rRMSE = 2.58%) and with REFs+VIs as the input variables. The 95:5 train–test ratio showed the best performance across all the models, which may be a suitable ratio for relatively smaller or medium-sized datasets. RF was identified to be the most stable and consistent ML model (r: 0.95, rRMSE: 1.37%). Findings in the study indicate that the integration of aerial remote sensing and ML-based data-run techniques could be useful for reliably predicting CGM at the pre-harvest stage, and developing precision corn harvest scheduling and management strategies for the growers.
Read full abstract