Soil spectral reflectance is a necessary input for land surface and radiative transfer models, and can be used to infer soil properties. Numerous soil reflectance inversion models have been developed based on mechanistic approaches, each with their own limitations. Mechanistic models based on radiative transfer theory are usually based on only a few input soil properties, whereas data-driven approaches are limited by high non-uniformity of available published datasets that severely limits the amount of data usable for model calibration. To address these limitations, a fully data-driven soil optics generative model (SOGM) for simulation of soil reflectance spectra from soil property inputs was developed based on the denoising diffusion probabilistic model (DDPM). The model was trained on an extensive dataset comprising nearly 180,000 soil spectra-property set pairs from 17 published datasets. The model generates soil reflectance spectra from text-based inputs describing soil properties and their values rather than only numerical values and labels in binary vector format, which means the model can handle variable formats for property reporting. Because the model is generative, it can simulate reasonable output spectra based on an incomplete set of available input properties, which becomes more reliable as the input property set becomes more complete. Two additional sub-models were also built to complement the SOGM: a spectral padding model that can fill in the gaps for spectra shorter than the target solar range (400 to 2499 nm), and a wet soil spectra model that can estimate the effects of water content on soil reflectance spectra given the dry spectrum predicted by the SOGM. It can also be easily integrated with other soil–plant radiation models used for remote sensing research such as PROSAIL and Helios 3D plant modeling software. The testing results of the SOGM on new datasets not included in model training demonstrated that the model can generate reasonable soil reflectance spectra based on available property inputs. Results also show soil clay/sand/silt fraction, organic carbon content, nitrogen content, and iron content tended to be important properties for spectra simulation. Inclusion of some trace minerals like nickel as model inputs decreased model performance because of their low concentrations and large propensity for ground-truth measurement error.
Read full abstract