Ensemble and single algorithm models to handle multicollinearity of UAV vegetation indices for predicting rice biomass

Radhwane Derraz,Farrah Melissa Muharam,Khairudin Nurulhuda,Noraini Ahmad Jaafar,Ng Keng Yap

doi:10.1016/j.compag.2023.107621

Abstract

Rice biomass is a biofuel’s source and yield indicator. Conventional sampling methods predict rice biomass accurately. However, these methods are destructive, time-consuming, expensive, and labour-intensive. Instead, unmanned aerial vehicles (UAVs) cover such shortcomings by providing rice-attribute-sensitive vegetation indices (VIs). Nevertheless, VIs are collinear, and their analyses require machine learning algorithms (MLs). The analysis of collinear VIs using base (single) and ensemble MLs is yet to be investigated. Therefore, this study aims to compare the base and ensemble MLs’ model performance, variance, stability (under/overfitting), and confidence for rice biomass prediction in multicollinearity context (MCC) and non-multicollinearity context (NMCC). To that end, a randomised complete block design experiment was held in the IADA KETARA rice granary in Terengganu, Malaysia. The experiment resulted in 360 samples of five biomass traits, five spectral bands, and ninety VIs. The MLs model performance and under/overfitting were better in MCC than in NMCC for predicting all rice biomass traits. The ensemble MLs outperformed the base MLs for predicting all rice biomass traits in MCC and NMCC. All base and ensemble MLs achieved inconsistent patterns of R2 and RMSE variances in MCC and NMCC. Finally, multicollinearity and the base-ensemble MLs concept did not affect the model confidence; rather, the latter was subject to the cross-effects of the ML and dataset characteristics. The present study significantly reveals the level of different base and ensemble MLs' sensitivity to multicollinearity regarding model performance, stability, variance, and confidence.

Full Text