Abstract

Automatically counting fish in sonar images has been attracting increasing attention in recent years because extreme efforts are needed in manual counting. Density map regression provides a promising approach in the counting field, but two obstacles are placed in front of fish counting in low resolution sonar images: the difficulty in distinguishing fish from the similar background noise and the inconsistency between the strip-shaped fishes in input images and dot-shaped ground truth density map. To address these issues, we present GPNet, a novel encoder-decoder network with global attention and point supervision, to boost sonar image-based fish counting accuracy. To alleviate the impact of background noise, we incorporate a segmentation module (SM) with global self-attention to the neck of the network to identify the fish region and space out background noise. Furthermore, feature enhancement modules (FEM) with a global receptive field are introduced to the encoder to enhance the feature representation and discrimination. To break down the performance upper bound resulting from target shape inconsistency between input and ground truth, we leverage fish center coordinates instead of the Gaussian density map to supervise the network training directly. Extensive experiments on a challenging public sonar image-based fish counting dataset, the ARIS dataset, demonstrate that GPNet achieves state-of-the-art performance both in counting accuracy and noise removal.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call