The vast amount of images generated by aerial imagery in the context of regular wildlife surveys nowadays require automatic processing tools. At the top of the mountain of different methods to automatically detect objects in images reigns deep learning's object detection. The recent focus given to this task has led to an influx of many different architectures of neural networks that are benchmarked against standard datasets like Microsoft's Common Objects in COntext (COCO). Performance on COCO, a large dataset of computer vision images, is given in terms of mean Average Precision (mAP). In this study, we use six pretrained networks to detect red deer from aerial images, three of which have never been used, to our knowledge, in a context of aerial wildlife surveys. We compare their performance along COCO's mAP and a common test metric in animal surveys, the F1-score. We also evaluate how dataset imbalance and background uniformity, two common difficulties in wildlife surveys, impact the performance of our models. Our results show that the mAP is not a reliable metric to select the best model to count animals in aerial images and that a counting-focused metric like the F1-score should be favored instead. Our best overall performance was achieved with Generalized Focal Loss (GFL). It scored the highest along both metrics, combining most accurate counting and localization (with average F1-score of 0.96 and 0.97 and average mAP scores of 0.77 and 0.89 on both datasets respectively) and is therefore very promising for future applications. While both imbalance and background uniformity improved the performance of our models, their combined effect had twice as much impact as the choice of architecture. This finding seems to confirm that the recent data-centric shift in the deep learning field could also lead to performance gains in wildlife surveys.
Read full abstract