MSCAF-Net: A General Framework for Camouflaged Object Detection via Learning Multi-Scale Context-Aware Features

Yu Liu,Juan Cheng,Haihang Li,Xun Chen

doi:10.1109/tcsvt.2023.3245883

Abstract

The aim of camouflaged object detection (COD) is to find objects that are hidden in their surrounding environment. Due to the factors like low illumination, occlusion, small size and high similarity to the background, COD is recognized to be a very challenging task. In this paper, we propose a general COD framework, termed as MSCAF-Net, focusing on learning multi-scale context-aware features. To achieve this target, we first adopt the improved Pyramid Vision Transformer (PVTv2) model as the backbone to extract global contextual information at multiple scales. An enhanced receptive field (ERF) module is then designed to refine the features at each scale. Further, a cross-scale feature fusion (CSFF) module is introduced to achieve sufficient interaction of multi-scale information, aiming to enrich the scale diversity of extracted features. In addition, inspired the mechanism of the human visual system, a dense interactive decoder (DID) module is devised to output a rough localization map, which is used to modulate the fused features obtained in the CSFF module for more accurate detection. The effectiveness of our MSCAF-Net is validated on four benchmark datasets. The results show that the proposed method significantly outperforms state-of-the-art (SOTA) COD models by a large margin. Besides, we also investigate the potential of our MSCAF-Net on some other vision tasks that are highly related to COD, such as polyp segmentation, COVID-19 lung infection segmentation, transparent object detection and defect detection. Experimental results demonstrate the high versatility of the proposed MSCAF-Net. The source code and results of our method are available at https://github.com/yuliu316316/MSCAF-COD.

Full Text