A variety of data driven, mostly supervised, machine learning approaches have been used to model landforms in soil and regolith sciences, commonly with a claim of enhanced objectivity of the resulting map. These models regularly rely on soil sample measurements or existing human derived mapping products to train or retrospectively validate a model. Case studies of unsupervised machine learning approaches are less common, and input data as well as clustering algorithms vary widely. In this study, a relatively simple, unsupervised machine learning approach was used to create a proxy landform map from partially independent, remotely-sensed data (digital elevation model, radiometric U, Th and K, Sentinel-2 derived band ratios, and Multi-resolution Valley Bottom Flatness). This machine learning workflow was developed for general, first-pass landform mapping in remote areas, where access is limited, to provide tools for mineral exploration. The workflow was designed for the Australian continent and previously applied to over 40 sites. However, given that the models were not trained or retrospectively validated with objective observations, the question arises whether the units identified represent meaningful differences in soil and landform properties. To answer this question, conditioned Latin Hypercube Sampling was used to identify sampling locations that capture the variability of properties of eight landform clusters produced from a machine learning workflow in the Mundaring State Forest, Western Australia. Soil cores (30 cm depth) were sampled at these 40 sites, and we combine portable X-ray fluorescence, visible near-infrared to shortwave infrared analyses, soil pH and field observations to identify differences between the modelled landform types, and how the soil physico-chemical and mineralogical properties relate to the model’s input feature layers. Our investigations show that the model produced largely contiguous landform units with distinctive differences that were reflected in measurable averages of geochemical and mineralogical soil properties. As such, highest Si concentrations correlated with sandy channel materials while Mn and Fe concentrations were highest in ferruginous duricrust, and white mica and chlorite group minerals were identified in shallow residual soils developed from granitic parent material. These results correlate well with the model input features and align with a general conceptual understanding of soil-landscape and regolith landform formation, and with existing soil and regolith maps. However, some inconsistencies were observed in the landform unit clusters, likely capturing the heterogeneity of the landform/soil and this provides an understanding of the limitations of categorical models.
Read full abstract