Abstract

In many areas of animal behaviour research, improvements in our ability to collect large and detailed data sets are outstripping our ability to analyse them. These diverse, complex and often high-dimensional data sets exhibit nonlinear dependencies and unknown interactions across multiple variables, and may fail to conform to the assumptions of many classical statistical methods. The field of machine learning provides methodologies that are ideally suited to the task of extracting knowledge from these data. In this review, we aim to introduce animal behaviourists unfamiliar with machine learning (ML) to the promise of these techniques for the analysis of complex behavioural data. We start by describing the rationale behind ML and review a number of animal behaviour studies where ML has been successfully deployed. The ML framework is then introduced by presenting several unsupervised and supervised learning methods. Following this overview, we illustrate key ML approaches by developing data analytical pipelines for three different case studies that exemplify the types of behavioural and ecological questions ML can address. The first uses a large number of spectral and morphological characteristics that describe the appearance of pheasant, Phasianus colchicus, eggs to assign them to putative clutches. The second takes a continuous data stream of feeder visits from PIT (passive integrated transponder)-tagged jackdaws, Corvus monedula, and extracts foraging events from it, which permits the construction of social networks. Our final example uses aerial images to train a classifier that detects the presence of wildebeest, Connochaetes taurinus, to count individuals in a population. With the advent of cheaper sensing and tracking technologies an unprecedented amount of data on animal behaviour is becoming available. We believe that ML will play a central role in translating these data into scientific knowledge and become a useful addition to the animal behaviourist's analytical toolkit.

Highlights

  • In many areas of animal behaviour research, improvements in our ability to collect large and detailed data sets are outstripping our ability to analyse them

  • We present a concise guide on the rationale behind unsupervised and supervised learning, and illustrate these methods by developing data analytical workflows to convert three data sets into useful biological knowledge: assigning pheasant, Phasianus colchicus, eggs to clutches based on their visual appearance, to subsequently study the response of brooding females to eggs that are not their own; constructing social networks based on co-occurrences of jackdaws, Corvus monedula, at feeding stations, to examine population level processes such as social learning; and automating the counting of individual wildebeest, Connochaetes taurinus, within aerial survey photos, to guide conservation policies

  • machine learning (ML) offers a hypothesis-free approach to model complex data sets where the type of relationship between measured variables is unknown. These methodologies circumvent the limitations of many classical statistical models, and are an attractive choice for generating novel hypotheses to describe unwieldy data sets that are being acquired at an unprecedented rate in various fields of animal behaviour research

Read more

Summary

Introduction

In many areas of animal behaviour research, improvements in our ability to collect large and detailed data sets are outstripping our ability to analyse them. The logistical difficulties of collecting replicated data, especially from wild populations, mean that sample sizes are small, even though data on each individual may be rich, with many hundreds (or even thousands) of factors to consider These complex data sets, generated from different sources, such as images and audio recordings, may fail to conform to assumptions of many classical statistical models (e.g. homoscedasticity and a Gaussian error structure). The generalization error or predictive performance is a measure of how many previously unseen images (known as the testing data set) the algorithm tags correctly Both statistical modelling and ML seek to build a mathematical description, a model, of the data and the underlying mechanism it represents; inevitably there is substantial overlap between the two (Breiman, 2001b; Friedman, 2001; Zoubin Ghahramani, 2015). We highlight some facets of animal behaviour where ML has already been deployed

Objectives
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call