Predicting Soil Properties and Interpreting Vis-NIR Models from across Continental United States.

Christopher M Clingensmith,Sabine Grunwald

doi:10.3390/s22093187

Christopher M Clingensmith, Sabine Grunwald

Open Access

https://doi.org/10.3390/s22093187

Copy DOI

Journal: Sensors (Basel, Switzerland)	Publication Date: Apr 21, 2022
Citations: 12	License type: CC BY 4.0

Affiliation: University of Florida

Abstract

The United States NRCS has a soil database that has data collected from across the country over the last several decades. This also includes soil spectral scans. This data is available, but it may not be used to its full potential. For this study, pedon, horizon and spectral data was extracted from the database for samples collected from 2011 to 2015. Only sites that had been fully described and horizons that had been analyzed for the full suite of desired properties were used. This resulted in over 14,000 samples that were used for modeling and eight soil properties: soil organic carbon (SOC); total nitrogen (TN); total sulfur (TS); clay; sand; exchangeable calcium (Caex); cation exchange capacity (CEC); and pH. Four chemometric methods were employed for soil property prediction: partial least squares (PLSR); Random Forest (RF); Cubist; and multivariable adaptive regression splines (MARS). The dataset was sufficiently large that only random subsetting was used to create calibration (70%) and validation (30%) sets. SOC, TN, and TS had the strongest prediction results, with an R2 value of over 0.9. Caex, CEC and pH were predicted moderately well. Clay and sand models had slightly lower performance. Of the four methods, Cubist produced the strongest models, while PLSR produced the weakest. This may be due to the complex relationships between soil properties and spectra that PLSR could not capture. The only drawback of Cubist is the difficult interpretability of variable importance. Future research should include the use of environmental variables to improve prediction results. Future work may also avoid the use of PLSR when dealing with large datasets that cover large areas and have high degrees of variability.

Full Text