IA-Net$:$ An Inception–Attention-Module-Based Network for Classifying Underwater Images From Others

Miao Yang,Haiwen Wang,Ke Hu,Zhiqiang Wei,Ge Yin

doi:10.1109/joe.2021.3126090

Abstract

To distinguish underwater images from natural images is one of the challenge of collecting and generation of underwater image data. Common image classification and recognition models classify the objects in an image depending on the saliency while suppressing the background. In this article, an inception–attention network (IA-Net), a convolutional neural network (CNN)-based model to classify the underwater images from natural images is reported, in which an inception–attention (I-A) module is constructed to simulate the visual correlation mechanism of classifying images taken from special environments such as fog, nighttime and under water. It is illustrated that the context background is as important as the salient object when understanding the underwater images. We executed experiments on a data set, which consists of 4000 underwater images and 5000 nonunderwater images, and demonstrate that the proposed IA-Net achieves an accuracy of 99.3 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\%$</tex-math></inline-formula> on underwater image classification, which is significantly better than classical image classification networks, such as AlexNet, InceptionV3, and ResNet. In addition, the comparative experiments prove that the IA-Net is superior to other networks when distinguishing underwater images from foggy, nighttime images and fish images taken in nonunderwater environments, although these images have indistinguishable characteristics with underwater images. Moreover, we demonstrate the I-A structure we proposed can be used to boost the performance of the existing object recognition networks. By substituting the inception module with the I-A module, the Inception-ResNetV2 network achieves a 10.7 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\%$</tex-math></inline-formula> top-1 error rate on the subset of ILSVRC-2012, which further illustrates the effectiveness of the correlation between the image background and subjective perception in improving the performance of the visual analysis tasks.

Full Text