AbstractThe current rate of decline in biodiversity exclaims ecological conservation. In response, camera traps are being increasingly deployed for the perlustration of wildlife. The analyses of camera trap data can aid in curbing species extinction. However, a substantial amount of time is lost in the manual review curtailing the usage of camera traps for prompt decision-making. The insuperable visual challenges and proneness of camera trap to record empty frames (frames that are natural backdrops with no wildlife presence) deem wildlife detection and species recognition a demanding and taxing task. Thus, we propose a pipeline for wildlife detection and species recognition to expedite the processing of camera trap sequences. The proposed pipeline consists of three stages: (i) empty frame removal, (ii) wildlife detection, and (iii) species recognition and classification. We leverage vision transformer (ViT), DEtection TRansformer (DETR), vision and detection transformer (ViDT), faster region based convolutional neural network (Faster R-CNN), inception v3, and ResNet 50 for the same. We examine the adroitness of the leveraged algorithms at new and unseen locations against the challenges of domain generalisation. We demonstrate the effectiveness of the proposed pipeline using the Caltech camera trap (CCT) dataset.KeywordsCamera trapsEmpty frame removalWildlife detectionWildlife species classificationDomain generalisationDEtection TRansformer (DETR)Vision transformer (ViT)Vision and detection transformers (ViDT)Inception v3

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call