Abstract

One of the main challenges in digital soil mapping is the imbalanced datasets for soils classification. For these datasets, machine learning techniques use to overestimate the majority classes and underestimate the minority ones. In general, this generates maps with poor precision and unrealistic results. Considering these maps for land use decision-making can have dire consequences. This is the case of acid sulfate (AS) soils, a type of harmful soil that can generate serious environmental damage when drained in agricultural or forestry activities. In the study area, the probability of finding AS soils is very high. Furthermore, some of the most hazardous AS soils in Finland are located there [1]. Therefore, it is necessary to create high-precision maps to avoid environmental damage. Since the dataset for this region is highly imbalanced, the first step in creating accurate maps is to balance the dataset. Although most  soil class datasets in nature are imbalanced, this problem has been hardly studied. In this work, we analyze different techniques to address the problem of imbalanced datasets. The methods considered to balance the dataset are under- and oversampling techniques and the combination of both. For the oversampling of the minority class, we create a kind of artificial samples from the quaternary geological map. The method used for the modeling is Random Forest, one of the best methods for the classification of AS soils [2,3]. Balancing the dataset improves the performance of the model in all the studied cases, where the values of the metrics for both classes are above 80%. Furthermore, we create AS soil probability maps for the four balanced datasets and the imbalanced dataset. A detailed comparison between the maps is made. In addition, the extent of the AS soils obtained in all the cases is compared with the extent of the AS soils in the conventionally produced occurrence map [1]. The modeled probability maps created from the balanced datasets have a high precision. The results of this study confirm the importance of balancing the dataset to improve the prediction and classification of AS soils. [1] Geological Survey of Finland. Acid Sulfate Soils–map services http://gtkdata.gtk.fi/hasu/index.html  [2] V. Estévez et al. 2022.  “Machine learning techniques for acid sulfate soil mapping in southeastern Finland”. Geoderma 406, 115446. [3] V. Estévez et al. 2023. “Improving prediction accuracy for acid sulfate soil mapping by means of variable selection”. Front. Environ. Sci. 11:1213069.  doi: 10.3389/fenvs.2023.1213069    

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.