Unsupervised Machine Learning and Data Mining Procedures Reveal Short Term, Climate Driven Patterns Linking Physico-Chemical Features and Zooplankton Diversity in Small Ponds

Erica Racchetti,Catia Maurone,Valeria Rossi,Nicolò Bellin,Marco Bartoli

doi:10.3390/w13091217

Abstract

Machine Learning (ML) is an increasingly accessible discipline in computer science that develops dynamic algorithms capable of data-driven decisions and whose use in ecology is growing. Fuzzy sets are suitable descriptors of ecological communities as compared to other standard algorithms and allow the description of decisions that include elements of uncertainty and vagueness. However, fuzzy sets are scarcely applied in ecology. In this work, an unsupervised machine learning algorithm, fuzzy c-means and association rules mining were applied to assess the factors influencing the assemblage composition and distribution patterns of 12 zooplankton taxa in 24 shallow ponds in northern Italy. The fuzzy c-means algorithm was implemented to classify the ponds in terms of taxa they support, and to identify the influence of chemical and physical environmental features on the assemblage patterns. Data retrieved during 2014 and 2015 were compared, taking into account that 2014 late spring and summer air temperatures were much lower than historical records, whereas 2015 mean monthly air temperatures were much warmer than historical averages. In both years, fuzzy c-means show a strong clustering of ponds in two groups, contrasting sites characterized by different physico-chemical and biological features. Climatic anomalies, affecting the temperature regime, together with the main water supply to shallow ponds (e.g., surface runoff vs. groundwater) represent disturbance factors producing large interannual differences in the chemistry, biology and short-term dynamic of small aquatic ecosystems. Unsupervised machine learning algorithms and fuzzy sets may help in catching such apparently erratic differences.

Highlights

Data in ecology often present high stochasticity, correlated features and a large number of predictors compared to the sample size of the dataset
The fuzzy-set theory provides a mathematical approach that is able to cope with imprecision
In this study we focused on the occurrence of the main zooplankton taxa in 24 pools and ponds that were randomly selected in a 200 km2 area located in the Cremona province [50] (Figure 1)

Summary

Introduction

Data in ecology often present high stochasticity, correlated features and a large number of predictors compared to the sample size of the dataset. The rise of machine learning algorithms in ecology in recent decades has become accessible thanks to the advance in computation power, large amounts of data and software availability [1]. These algorithms are well suited to deal with complex and large ecological datasets and with nonlinearity [2]. Some machine learning algorithms are useful with datasets composed by a higher number of features as compared to the number of observations [3]. The algorithms identify patterns in data without considering target variables to identify clusters and structures

Methods

Results

Conclusion