Abstract

ABSTRACT: Different uses of soil legacy data such as training dataset as well as the selection of soil environmental covariables could drive the accuracy of machine learning techniques. Thus, this study evaluated the ability of the Random Forest algorithm to predict soil classes from different training datasets and extrapolate such information to a similar area. The following training datasets were extracted from legacy data: a) point data composed of 53 soil samples; b) 30 m buffer around the soil samples, and soil map polygons excluding: c) 20 m; and d) 30 m from the boundaries of polygons. These four datasets were submitted to principal component analysis (PCA) to reduce multidimensionality. Each dataset derived a new one. Different combinations of predictor variables were tested. A total of 52 models were evaluated by means of error of models, prediction uncertainty and external validation for overall accuracy and Kappa index. The best result was obtained by reducing the number of predictors with the PCA along with information from the buffer around the points. Although Random Forest has been considered a robust spatial predictor model, it was clear it is sensitive to different strategies of selecting training dataset. Effort was necessary to find the best training dataset for achieving a suitable level of accuracy of spatial prediction. To identify a specific dataset seems to be better than using a great number of variables or a large volume of training data. The efforts made allowed for the accurate acquisition of a mapped area 15.5 times larger than the reference area.

Highlights

  • In Brazil there is a need for maps on a detailed scale, but few resources are available for soil surveys

  • The following training datasets were extracted from legacy data: a) point data composed of 53 soil samples; b) 30 m buffer around the soil samples, and soil map polygons excluding: c) 20 m; and d) 30 m from the boundaries of polygons

  • Such a difference is clear when comparing the models with training datasets derived from points and polygons, the last one showing more observations and less error

Read more

Summary

Introduction

In Brazil there is a need for maps on a detailed scale, but few resources are available for soil surveys. Soil legacy could be a source of training data in machine-learning techniques (Pelegrino et al, 2016), which could formalize soil-landscape relationships, apply the information to areas under similar environmental conditions, and enhance the mapping of areas and result in savings in both time and cost (Silva et al, 2016) This is an important strategy for mapping in Brazil, due to the restriction of detailed soil surveys to small areas (Mendonça-Santos and Santos, 2007). The method for using legacy data should be investigated so as to provide a suitable source of data for Random Forest either from points or polygons

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.