Improving the accessibility and transferability of machine learning algorithms for identification of animals in camera trap images: MLWIC2.

Michael A Tabak,Jacob S Ivan,Reesa Y Conrey,Eric A Odell,David W Wolfson,Jeff Clune,Eric S Newkirk,Erica J Newton,Kurt C Vercauteren,Ryan K Brook,Daniel P Walsh,Fabiola Iannarilli,Raoul K Boughton,Amy J Davis,Ryan S Miller,James C Beasley,Mohammad Sadegh Norouzzadeh ,John Erb ,Jesse S Lewis ,Jennifer L Stenglein

doi:10.1002/ece3.6692

Abstract

Motion‐activated wildlife cameras (or “camera traps”) are frequently used to remotely and noninvasively observe animals. The vast number of images collected from camera trap projects has prompted some biologists to employ machine learning algorithms to automatically recognize species in these images, or at least filter‐out images that do not contain animals. These approaches are often limited by model transferability, as a model trained to recognize species from one location might not work as well for the same species in different locations. Furthermore, these methods often require advanced computational skills, making them inaccessible to many biologists. We used 3 million camera trap images from 18 studies in 10 states across the United States of America to train two deep neural networks, one that recognizes 58 species, the “species model,” and one that determines if an image is empty or if it contains an animal, the “empty‐animal model.” Our species model and empty‐animal model had accuracies of 96.8% and 97.3%, respectively. Furthermore, the models performed well on some out‐of‐sample datasets, as the species model had 91% accuracy on species from Canada (accuracy range 36%–91% across all out‐of‐sample datasets) and the empty‐animal model achieved an accuracy of 91%–94% on out‐of‐sample datasets from different continents. Our software addresses some of the limitations of using machine learning to classify images from camera traps. By including many species from several locations, our species model is potentially applicable to many camera trap studies in North America. We also found that our empty‐animal model can facilitate removal of images without animals globally. We provide the trained models in an R package (MLWIC2: Machine Learning for Wildlife Image Classification in R), which contains Shiny Applications that allow scientists with minimal programming experience to use trained models and train new models in six neural network architectures with varying depths.

Highlights

Motion-activated wildlife cameras are frequently used to remotely observe wild animals, but images from camera traps must be classified to extract their biological data (O’Connell, Nichols, & Karanth, 2011)
We found that our empty-animal model can facilitate removal of images without animals globally
We provide the trained models in an R package (MLWIC2: Machine Learning for Wildlife Image Classification in R), which contains Shiny Applications that allow scientists with minimal programming experience to use trained models and train new models in six neural network architectures with varying depths

Summary

| INTRODUCTION

Motion-activated wildlife cameras (or “camera traps”) are frequently used to remotely observe wild animals, but images from camera traps must be classified to extract their biological data (O’Connell, Nichols, & Karanth, 2011). Several researchers have provided excellent Python repositories for using computer vision to analyze camera trap images (Beery et al, 2019; Beery, Wu, Rathod, Votel, & Huang, 2020; Norouzzadeh et al, 2018; Schneider et al, 2020) These software packages enable programmers to use and train models to detect, classify, and evaluate the behavior of animals in camera trap images. To facilitate the use of this type of model by biologists with minimal programming experience, Machine Learning for Wildlife Image Classification (MLWIC2) includes an option to train and use models in user-friendly Shiny Applications (Chang, Cheng, Alaire, Xie, & McPherson, 2019), allowing users to point-and-click instead of using a command line. This facilitates easier site-specific model training when our models do not perform to expectations

| MATERIALS AND METHODS

Findings

| DISCUSSION