In the context of rapid industrialization, efficiently detecting metal corrosion areas has become a critical task in preventing material damage. Unlike conventional semantic segmentation targets, metal corrosion characteristics vary significantly in color, texture, and size. Traditional image segmentation methods need improvement in scenarios involving occlusions, shadows, and defects. This paper proposes a convolution and sequence encoding combined network, MCD-Net, for metal corrosion area segmentation. First, a visual Transformer sequence encoder is introduced into the convolutional encoder–decoder network to enhance global information processing capabilities and establish long-range feature dependencies. A feature fusion method based on an attention module is proposed to enhance the model’s ability to recognize corrosion boundaries, thereby enhancing segmentation accuracy and model robustness. Finally, in the model’s decoding stage, a score-based multi-scale feature enhancement method is employed to emphasize significant features in the corrosion areas. Experimental results indicate that this method attained an F1 score of 84.53% on a public corrosion dataset, demonstrating the model’s deeper understanding and reasoning capabilities for shadow and defect features, as well as excellent noise resistance performance.
Read full abstract