Soil potassium is a crucial nutrient element necessary for crop growth, and its efficient measurement has become essential for developing rational fertilization plans and optimizing crop growth benefits. At present, data mining technology based on near-infrared (NIR) spectroscopy analysis has proven to be a powerful tool for real-time monitoring of soil potassium content. However, as technology and instruments improve, the curse of the dimensionality problem also increases accordingly. Therefore, it is urgent to develop efficient variable selection methods suitable for NIR spectroscopy analysis techniques. In this study, we proposed a three-step progressive hybrid variable selection strategy, which fully leveraged the respective strengths of several high-performance variable selection methods. By sequentially equipping synergy interval partial least squares (SiPLS), the random forest variable importance measurement (RF(VIM)), and the improved mean impact value algorithm (IMIV) into a fusion framework, a soil important potassium variable selection method was proposed, termed as SiPLS-RF(VIM)-IMIV. Finally, the optimized variables were fitted into a partial least squares (PLS) model. Experimental results demonstrated that the PLS model embedded with the hybrid strategy effectively improved the prediction performance while reducing the model complexity. The RMSET and RT on the test set were 0.01181% and 0.88246, respectively, better than the RMSET and RT of the full spectrum PLS, SiPLS, and SiPLS-RF(VIM) methods. This study demonstrated that the hybrid strategy established based on the combination of NIR spectroscopy data and the SiPLS-RF(VIM)-IMIV method could quantitatively analyze soil potassium content levels and potentially solve other issues of data-driven soil dynamic monitoring.
Read full abstract