To autonomously detect wildlife images captured by camera traps on a platform with limited resources and address challenges such as filtering out photos without optimal objects, as well as classifying and localizing species in photos with objects, we introduce a specialized wildlife object detector tailored for camera traps. This detector is developed using a dataset acquired by the Saola Working Group (SWG) through camera traps deployed in Vietnam and Laos. Utilizing the YOLOv6-N object detection algorithm as its foundation, the detector is enhanced by a tailored optimizer for improved model performance. We deliberately introduce asymmetric convolutional branches to enhance the feature characterization capability of the Backbone network. Additionally, we streamline the Neck and use CIoU loss to improve detection performance. For quantitative deployment, we refine the RepOptimizer to train a pure VGG-style network. Experimental results demonstrate that our proposed method empowers the model to achieve an 88.3% detection accuracy on the wildlife dataset in this paper. This accuracy is 3.1% higher than YOLOv6-N, and surpasses YOLOv7-T and YOLOv8-N by 5.5% and 2.8%, respectively. The model consistently maintains its detection performance even after quantization to the INT8 precision, achieving an inference speed of only 6.15 ms for a single image on the NVIDIA Jetson Xavier NX device. The improvements we introduce excel in tasks related to wildlife image recognition and object localization captured by camera traps, providing practical solutions to enhance wildlife monitoring and facilitate efficient data acquisition. Our current work represents a significant stride toward a fully automated animal observation system in real-time in-field applications.
Read full abstract