Abstract

The universal soil loss equation (USLE) is a widely used empirical model for estimating soil loss. Among the USLE model factors, the cover management factor (C-factor) is a critical factor that substantially impacts the estimation result. Assigning C-factor values according to a land-use/land-cover (LULC) map from field surveys is a typical traditional approach. However, this approach may have limitations caused by the difficulty and cost in conducting field surveys and updating the LULC map regularly, thus significantly affecting the feasibility of multi-temporal analysis of soil erosion. To address this issue, this study uses data mining to build a random forest (RF) model between eight geospatial factors and the C-factor for the Shihmen Reservoir watershed in northern Taiwan for multi-temporal estimation of soil loss. The eight geospatial factors were collected or derived from remotely sensed images taken in 2004, a digital elevation model, and related digital maps. Due to the memory size limitation of the R software, only 4% of the total data points (population dataset) in each C-factor class were selected as the sample dataset (input dataset) for analysis using the stratified random sampling method. Seventy percent of the input dataset was used to train the RF model, and the other 30% was used to test the model. The results show that the RF model could capture the trend of vegetation recovery and soil loss reduction after the destructive event of Typhoon Aere in 2004 for multi-temporal analysis. Although the RF model was biased by the majority class’s large sample size (C = 0.01 class), the estimated soil erosion rate was close to the measurement obtained by the erosion pins installed in the watershed (90.6 t/ha-year). After the model’s completion, we furthered our aim to address the input dataset’s imbalanced data problem to improve the model’s classification performance. An ad-hoc down-sampling of the majority class technique was used to reduce the majority class’s sampling rate to 2%, 1%, and 0.5% while keeping the other minority classes at a 4% sample rate. The results show an improvement of the Kappa coefficient from 0.574 to 0.732, the AUC from 0.780 to 0.891, and the true positive rate of all minority classes combined from 0.43 to 0.70. However, the overall accuracy decreases from 0.952 to 0.846, and the true positive rate of the majority class declines from 0.99 to 0.94. The best average C-factor was achieved when the sampling rate of the majority class was 1%. On the other hand, the best soil erosion estimate was obtained when the sampling rate was 2%.

Highlights

  • Severe soil erosion will increase soil sedimentation and severely reduce the water storage and supply capabilities of reservoirs

  • According to the data preprocessing and study procedures described in previous sections, the C-factor random forest (RF) model was constructed using the training data and tested with the test data

  • Unlike a previous study [30], which only used at most 100 points from each C-factor class and substantially overestimated soil erosion, we tried to maximize the number of data points that could be processed to train the RF model given the memory size limitation of R

Read more

Summary

Introduction

Severe soil erosion will increase soil sedimentation and severely reduce the water storage and supply capabilities of reservoirs. Soil erosion has been one of the core topics in agriculture, natural resources conservation, and other related fields since the end of the 1920s [2]. A significant trend in soil erosion study was developing various measurements and prediction models for different locations and applications, such as the AGNPS (agricultural non-point source pollution model [3]), CREAMS (chemicals, runoff and erosion from agricultural management systems [4]), EPIC (erosion-productivity impact calculator [5]), SWRRBWQ (simulator for water resources in rural basins-water quality [6]), WEPP (water erosion prediction project [7]), and USLE (universal soil loss equation [8,9]).

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call