Predicting Soil Textural Classes Using Random Forest Models: Learning from Imbalanced Dataset

Sina Mallah,Alireza Amirian-Chakan,Thomas Scholten,Ruth Kerry,Ruhollah Taghizadeh-Mehrjardi,Mostafa Emadi,Amir Hosein Mosavi,Naser Davatgar,Bahareh Delsouz Khaki

doi:10.3390/agronomy12112613

Abstract

Soil provides a key interface between the atmosphere and the lithosphere and plays an important role in food production, ecosystem services, and biodiversity. Recently, demands for applying machine learning (ML) methods to improve the knowledge and understanding of soil behavior have increased. While real-world datasets are inherently imbalanced, ML models overestimate the majority classes and underestimate the minority ones. The aim of this study was to investigate the effects of imbalance in training data on the performance of a random forest model (RF). The original dataset (imbalanced) included 6100 soil texture data from the surface layer of agricultural fields in northern Iran. A synthetic resampling approach using the synthetic minority oversampling technique (SMOTE) was employed to make a balanced dataset from the original data. Bioclimatic and remotely sensed data, distance, and terrain attributes were used as environmental covariates to model and map soil textural classes. Results showed that based on mean minimal depth (MMD), when imbalanced data was used, distance and annual mean precipitation were important, but when balanced data were employed, terrain attributes and remotely sensed data played a key role in predicting soil texture. Balanced data also improved the accuracies from 44% to 59% and 0.30 to 0.52 with regard to the overall accuracy and kappa values, respectively. Similar increasing trends were observed for the recall and F-scores. It is concluded that, in modeling soil texture classes using RF models through a digital soil mapping approach, data should be balanced before modeling.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Agronomy	Publication Date: Oct 24, 2022
Citations: 11	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Predicting Soil Textural Classes Using Random Forest Models: Learning from Imbalanced Dataset

Abstract

Talk to us

Similar Papers

More From: Agronomy

Lead the way for us

Similar Papers

Soil textural class modeling using digital soil mapping approaches: Effect of resampling strategies on imbalanced dataset predictions
Fereshteh Mirzaei ... Ruth Kerry
Geoderma Regional | VOL. 38
Fereshteh Mirzaei, et. al.Fereshteh Mirzaei ... Ruth Kerry
15 Jun 2024
Geoderma Regional | VOL. 38

Establishing management zones for irrigation using soil properties and Remote Sensing
Faten Ksantini ... Miguel Quemada
-
Faten Ksantini, et. al.Faten Ksantini ... Miguel Quemada
08 Mar 2024
08 Mar 2024

Automated semiconductor wafer defect classification dealing with imbalanced data
Po-Hsuan Lee ... Wei Fang
-
Po-Hsuan Lee, et. al.Po-Hsuan Lee ... Wei Fang
20 Mar 2020
20 Mar 2020

BALANCED VS IMBALANCED TRAINING DATA: CLASSIFYING RAPIDEYE DATA WITH SUPPORT VECTOR MACHINES
M Ustuner ... S Abdikan
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences | VOL. XLI-B7
M Ustuner, et. al.M Ustuner ... S Abdikan
21 Jun 2016
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences | VOL. XLI-B7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Predicting Soil Textural Classes Using Random Forest Models: Learning from Imbalanced Dataset

Abstract

Talk to us

Similar Papers

More From: Agronomy