TCU-Net: Transformer and Convolutional Neural Network-Based Advanced U-Net for Concealed Object Detection

Kyeong-Beom Park,Jae Yeol Lee

doi:10.1109/access.2022.3223424

Abstract

Camouflaged object detection (COD) seeks to find concealed objects hidden in natural surroundings. COD is challenging since it has to distinguish intrinsic similarities between foreground objects and background surroundings, unlike salient object detection. Convolutional neural network (CNN)-based approaches are proposed to overcome this challenge. However, they have inherent limitations in modeling and extracting global contexts. Although Transformer-based approaches are proposed to tackle this problem, which can maintain the semantic features of input images, they have limitations in learning localized spatial features in the limited receptive field. Therefore, one of the main challenges is to conduct accurate and robust COD while maintaining global contexts without sacrificing low-level contexts. This study proposes a novel concealed object detection and segmentation method using Transformer and CNN-based advanced U-Net (TCU-Net). TCU-Net can extract globalized semantic features using the Swin Transformer-based encoder and localized spatial features using the attentive inception decoder. In particular, multi-dilated residual (MDR) blocks connecting the encoder and decoder generate refined multi-level features to improve discriminability. Finally, the attentive inception decoder generates the final camouflaged object mask by maintaining the localized spatial information. Instead of simple up-sampling of the feature map, the attentive inception decoder conducts cascaded deconvolution through inception and attention modules. A weighted hybrid loss function is used for optimizing the model, consisting of the binary cross entropy (BCE) and intersection over union (IoU) losses. We comprehensively compared the proposed TCU-Net with previous studies by analyzing different metrics based on four public datasets, such as CAMO, CHAMELEON, COD10K, and NC4K. An ablation study was also conducted to evaluate network architectures and loss functions to verify advantages of the proposed approach. Experimental analysis on public datasets proves that the proposed TCU-Net outperforms previous approaches.

Full Text