Abstract

The detection of underwater fish targets is critical for ecological monitoring and marine biodiversity research. However, underwater fish detection is typically constrained by problems including low image quality and variable underwater surroundings. On behalf of further improving the underwater fish detection accuracy in complex underwater environments, this paper proposes a dual-path (DP) Pyramid Vision Transformer (PVT) feature extraction network named DP-FishNet. The backbone network DP-PVT composed from the PVT Network is made up of two feature extraction paths. The first represents the Vision Transformer path, which extracts global features to enhance the distinction between the foreground and background of underwater images. The second is the convolutional neural network path, which enhances the accuracy of detecting small targets by extracting local features. Additionally, to more effectively utilize the feature information extracted by the network, this paper provides a promising solution to employ the content-aware reassembly of features (Carafe) in the feature pyramid network (FPN). The seesaw loss is utilized as a classification loss to address the problem of unbalanced samples caused by the gap in the number of fish populations. According to the experimental findings, the AP and AP50 of the DP-FishNet are 76.0% and 95.2%, respectively. In comparison to currently available advanced two-stage detection algorithms, the quantity of computation and parameters is reduced by approximately 40%. DP-FishNet strengthens the ability to extract global and local features from underwater images and enhances feature reuse. DP-FishNet can be utilized to detect fish targets in actual and complicated underwater habitats.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call