Abstract

AbstractUnderwater images are the most direct and effective ways to obtain underwater information. However, underwater images typically suffer from contrast reduction and colour distortion due to the absorption and scattering of water by light, which seriously limits the further development of underwater visual tasks. Recently, the convolutional neural network has been extensively applied in underwater image enhancement for its powerful local information extraction capabilities, but due to the locality of convolution operation, it cannot capture the global context well. Although the recently emerging Transformer can capture global context, it cannot model local correlations. Cformer is proposed, which is an Unet‐like hybrid network structure. First, a Depth Self‐Calibrated block is proposed to extract the local features of the image effectively. Second, a novel Cross‐Shaped Enhanced Window Transformer block is proposed. It captures long‐range pixel interactions while dramatically reducing the computational complexity of feature maps. Finally, the depth self‐calibrated block and the cross‐shaped enhanced window Transformer block are ingeniously fused to build a global–local Transformer module. Extensive ablation studies are performed on public underwater datasets to demonstrate the effectiveness of individual components in the network. The qualitative and quantitative comparisons indicate that Cformer achieves superior performance compared to other competitive models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call