Ancient mural segmentation based on multiscale feature fusion and dual attention enhancement

Jianfang Cao,Zhen Cao,Zhiqiang Chen,Fang Wang,Xianhui Wang,Zhuolin Yang

doi:10.1186/s40494-024-01172-x

Abstract

To address the fuzzy segmentation boundaries, missing details, small target losses and low efficiency of traditional segmentation methods in ancient mural image segmentation scenarios, this paper proposes a mural segmentation model based on multiscale feature fusion and a dual attention-augmented segmentation model (MFAM). The model uses the MobileViT network, which integrates a coordinate attention mechanism, as the feature extraction backbone network. It attains global and local expression capabilities through self-attention, class convolution, and coordinate attention and focuses on location information to expand the receptive field and achieve improved feature extraction efficiency. An A_R_ASPP feature enhancement module is proposed for the attention-optimized residual atrous spatial pyramid pooling module. The module uses residual connections to solve the small target loss problem in murals caused by the excessive sampling rate of atrous convolution and uses a feature attention mechanism to adaptively adjust the feature map weight according to the channel importance levels. A dual attention-enhanced feature fusion module is proposed for multiscale decoder feature fusion to improve the mural segmentation effect. This module uses a cross-level aggregation strategy and an attention mechanism to weight the importance of different feature levels to obtain multilevel semantic feature representations. The model improves the mean intersection over union (MIoU) by 3.06% and the MPA by 1.81% on a mural dataset compared with other models. The model is proven to be effective at improving the segmentation details, efficiency and small target segmentation results produced for mural images, and a new method is proposed for segmenting ancient mural images.

Full Text