Abstract

Selection of training samples plays an important role in updating conventional soil maps with data mining models. In this paper, we developed a method to determine spatial locations of training samples based on spatial neighborhood analysis of environmental covariates for each soil polygon. Training samples were selected based on a single environmental variable or integrated variables generated using multiple variables. Sensitivity analysis was also conducted to test the effect of different spatial neighborhood sizes and selected sample numbers on soil mapping accuracy. Random selection of training samples from soil polygons and soil types respectively were applied to compare with the proposed method in a study area in Raffelson watershed in La Crosse, Wisconsin of USA. Random forest was adopted as the soil prediction model. Results showed that training samples selected using single variables such as Topographic Wetness Index (TWI), slope, plan curvature, profile curvature or slope length factor with the proposed method improved the overall mapping accuracies compared with the conventional soil map, of which using TWI achieved the highest improvement of 27%. The proposed method using TWI, slope or slope length factor performed better than random selection strategies. Random selection from soil polygons generated higher overall mapping accuracies than from soil types. It was concluded that using composite environmental variables which could represent the soil forming environment of a study area well is recommended when applying the proposed method. The proposed method is not sensitive to the selected sample number, but an appropriate neighborhood size is needed for using the proposed method. In our study area with small spatial coverage, neighborhood size 5 × 5 or 3 × 3 is recommended.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call