The recognition of fish feeding behavior based on machine vision is essential for optimizing fish feeding strategies and enhancing the efficiency of aquaculture. Building an efficient, multi-feature extraction model for fish feeding recognition, especially on mobile and edge devices, remains a significant challenge. In the paper, we propose a novel multi-feature extraction (MFE)-MobileViTv3 model, which improve MobileViTv3 with the MFE blocks. It can extract spatio-temporal features while obtaining the lightweight characteristic. The MFE block is designed by improving ActionNet, with the frequency channel attention (FCA) and multi-head self-attention (MHSA) mechanisms. It can fully extract spatio-temporal, motion, and channel features from video streaming, thereby further improving the feature extraction capabilities, which subsequently enhances the model's recognition accuracy. The experiments were carried out on an industrial aquaculture farm. We built a dataset of Micropterus salmoides, and then conducted the compare experiments. Compared with C3D, R3D, ResNet50, SlowFast, AlexNet, and MobileNetV3, our model can achieve a classification accuracy of 96.7 % for feeding intensity, with fewer parameters (0.96 M) and FLOPs (8.44G). The results show that the proposed model can effectively recognize fish feeding behavior with fewer parameters. Additionally, we introduce two evaluation metrics for the feeding process: Average Feeding Intensity and Strong Feeding Ratio. The metrics are conducive to the quantitative evaluation of the fish's vigor and health status.
Read full abstract