Insights and approaches using deep learning to classify wildlife

Zhongqi Miao,Mohammad Sadegh Norouzzadeh,Wayne M Getz,Rauri C K Bowie,Ziwei Liu,Ran Nathan,Stella X Yu,Oliver Muellerklein,Jiayun Wang,Kaitlyn M Gaynor,Alex Mcinturff

doi:10.1038/s41598-019-44565-w

Abstract

The implementation of intelligent software to identify and classify objects and individuals in visual fields is a technology of growing importance to operatives in many fields, including wildlife conservation and management. To non-experts, the methods can be abstruse and the results mystifying. Here, in the context of applying cutting edge methods to classify wildlife species from camera-trap data, we shed light on the methods themselves and types of features these methods extract to make efficient identifications and reliable classifications. The current state of the art is to employ convolutional neural networks (CNN) encoded within deep-learning algorithms. We outline these methods and present results obtained in training a CNN to classify 20 African wildlife species with an overall accuracy of 87.5% from a dataset containing 111,467 images. We demonstrate the application of a gradient-weighted class-activation-mapping (Grad-CAM) procedure to extract the most salient pixels in the final convolution layer. We show that these pixels highlight features in particular images that in some cases are similar to those used to train humans to identify these species. Further, we used mutual information methods to identify the neurons in the final convolution layer that consistently respond most strongly across a set of images of one particular species. We then interpret the features in the image where the strongest responses occur, and present dataset biases that were revealed by these extracted features. We also used hierarchical clustering of feature vectors (i.e., the state of the final fully-connected layer in the CNN) associated with each image to produce a visual similarity dendrogram of identified species. Finally, we evaluated the relative unfamiliarity of images that were not part of the training set when these images were one of the 20 species “known” to our CNN in contrast to images of the species that were “unknown” to our CNN.

Highlights

Collecting animal imagery data with motion sensitive cameras is a minimally invasive approach to obtaining relative densities and estimating population trends in animals over time[1,2]
To some extent, the features used by the Convolutional Neural Network (CNN) to identify animals were similar to those used by the human
In the Discussion section, we provide a brief example of how interpretations of CNNs can help to understand the causes of misclassification and to make potential improvements of the method

Summary

Introduction

Collecting animal imagery data with motion sensitive cameras is a minimally invasive approach to obtaining relative densities and estimating population trends in animals over time[1,2]. Deep-learning methods[5] have revolutionized our ability to train digital computers to recognize all kinds of objects from imagery data including faces[6,7] and wildlife species[4,8,9] (see Appendix 1 for more background information) It may significantly increase the efficiency of associated ecological studies[4,10]. They lack the necessary transparency for effective implementation and reproducibility of deep learning methods in wildlife ecology and conservation biology To identify such features in the context of classification of wildlife from camera trap data, we trained a Convolutional Neural Network (CNN)[9,13] using a deep learning algorithm (VGG-16, as described elsewhere[14] and in Appendix 4) on a fully annotated dataset from Gorongosa National Park, Mozambique (Appendix 2) that has not previously been subjected to machine learning. In the Discussion section, we provide a brief example of how interpretations of CNNs can help to understand the causes of misclassification and to make potential improvements of the method

Results

Discussion

Conclusion