Computer vision-based image classification plays a vital role in developing surveillance tools for measuring the biological behavior of bees and their disease detection. Native bees often face numerous environmental threats, ranging from invasive bees to numerous parasitic diseases, which affect not only the existing ecosystem but also the booming honey and wax industries. Numerous ML-based, pre-trained models showed potential in bee classification and monitoring tasks, but heavily curated data-set and closed-set models hinder their applicability in-field monitoring tasks. In this paper, we proposed a deep learning model to obtain improved levels of feature representations of eleven economically important bee species, fine-grained object (e.g. parasite, pollen) detection for bee-health monitoring and gradually progress to an end-to-end model to provide a solution for bee surveillance. Our model can extract learned feature representations from publicly available complex back-grounded images and propose similar usage on other domains through a qualitative analysis to learn appropriate defining features, Specifically for morphological classification. In particular, we utilize a variant of the transformer encoder-decoder architecture with the incorporation of extracted image features from a ResNet50 network. Our model obtained 92.45% classification accuracy on the bee species classification task and up to 99.18% on fine-grain object detection sub-tasks. Besides the classification task, our end-to-end model can detect varroa pests and pollen on bee images with 94.50% and 99.18% accuracies. Our model outperformed other existing models for bee surveillance or health monitoring tools. We also discussed the applicability of our BeenNet model in real-time settings. Overall, our end-to-end model has implications in both computer vision and biological computing tasks, such as visual feature extraction, in-domain classification, and sub-task identification. It will also serve as a baseline for future bee monitoring tools and a multi-modal model for disease detection.
Read full abstract