Predictive pollen-based biome modeling using machine learning.

Magdalena K Sobol,Sarah A Finkelstein

doi:10.1371/journal.pone.0202214

Abstract

This paper investigates suitability of supervised machine learning classification methods for classification of biomes using pollen datasets. We assign modern pollen samples from Africa and Arabia to five biome classes using a previously published African pollen dataset and a global ecosystem classification scheme. To test the applicability of traditional and machine-learning based classification models for the task of biome prediction from high dimensional modern pollen data, we train a total of eight classification models, including Linear Discriminant Analysis, Logistic Regression, Naïve Bayes, K-Nearest Neighbors, Classification Decision Tree, Random Forest, Neural Network, and Support Vector Machine. The ability of each model to predict biomes from pollen data is statistically tested on an independent test set. The Random Forest classifier outperforms other models in its ability correctly classify biomes given pollen data. Out of the eight models, the Random Forest classifier scores highest on all of the metrics used for model evaluations and is able to predict four out of five biome classes to high degree of accuracy, including arid, montane, tropical and subtropical closed and open systems, e.g. forests and savanna/grassland. The model has the potential for accurate reconstructions of past biomes and awaits application to fossil pollen sequences. The Random Forest model may be used to investigate vegetation changes on both long and short time scales, e.g. during glacial and interglacial cycles, or more recent and abrupt climatic anomalies like the African Humid Period. Such applications may contribute to a better understanding of past shifts in vegetation cover and ultimately provide valuable information on drivers of climate change.

Highlights

Past environmental conditions can be inferred from proxy data such as pollen
The objectives of this paper are to: 1) review machine learning classification methods suitable for prediction of biomes using pollen datasets; 2) test the applicability of supervised machine learning classification models for the task of biome prediction from more complete modern pollen data given a set of training examples of a priori labeled observation set; 3) analyze and statistically compare chosen classification methods; 4) identify, using statistical measures, the highest performing classification model able to accurately predict biomes from modern pollen data; 5) qualitatively compare our best ML-based model against the classical biomization method previously developed for the region
Multi-class classification may be re-framed into a simpler binary classification via an approach known as one-versus-rest [21] wherein separate classification models are fitted for individual biome class against the rest of the biome classes combined

Summary

Introduction

Past environmental conditions can be inferred from proxy data such as pollen. Studies of fossil pollen have been instrumental in our understanding of past shifts in vegetation [1,2] and variations in climate [3,4,5]. The accuracy of pollen-based paleoenvironmental reconstructions is dependent on numerically quantified relationships between modern pollen assemblages and variables of interest, be they quantitative or qualitative. These calibration sets allow for robust numerical modeling of pollen-vegetation-climate relationships. Meaningful estimates of past environments rely on large and accurate modern calibration sets [6].

Objectives

Methods

Results

Discussion

Conclusion