Abstract
In light of the challenges imposed by fish behavior recognition, which arise from environmental noise and dim lighting in aquaculture environments and adversely affect the effectiveness of unimodal recognition methods based on either sound or visual cues, this paper proposes a fish behavior recognition model, Mul-SEResNet50, based on the fusion of audio and visual information. To address issues such as image blurring and indistinct sounds in aquaculture environments, which hinder the effectiveness of multimodal fusion and complementary modalities, a multimodal interaction fusion (MIF) module is introduced. This module integrates audio-visual modalities at multiple stages to achieve a more comprehensive joint feature representation. To enhance complementarity during the fusion process, we designed a U-shaped bilinear fusion structure to fully utilize multimodal information, capture cross-modal associations, and extract high-level features. Furthermore, to address the potential loss of key features, a temporal aggregation and pooling (TAP) layer is introduced to preserve more fine-grained features by extracting both the maximum and average values within pooling regions. To validate the effectiveness of the proposed model, both ablation experiments and comparative experiments are conducted. The results demonstrate that Mul-SEResNet50 achieves a 5.04 % accuracy improvement over SEResNet50 without sacrificing detection speed. Compared to the state-of-the-art U-FusionNet-ResNet50 +SENet model, Mul-SEResNet50 achieves accuracy and F1 score improvements of 0.47 % and 1.32 %, respectively. These findings confirm the efficacy of the proposed model in terms of accurately recognizing fish behavior, facilitating the precise monitoring of fish behavior.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.