Dual attention transformer network for pixel-level concrete crack segmentation considering camera placement

Yingjie Wu,Shaoqi Li,Jinge Zhang,Yancheng Li,Yang Li,Yingqiao Zhang

doi:10.1016/j.autcon.2023.105166

Abstract

Pixel-level crack segmentation remains a challenging task due to the trade-off between computational cost and accuracy, as well as the small size of real-world cracks, typically submillimeter in width, resulting in limited pixels for analysis. To address these challenges, this paper proposes a Pixel Crack Transformer Network (PCTNet) to investigate the impact of different camera placements on network performance. PCTNet adopts a hierarchical structure with Cross-Scale PatchEmbedding Layer and Dual Attention Transformer Block, enabling the generation of multi-scale feature maps and the fusion of global and local features. PCTNet achieves a reduction of up to 64% in computational cost compared to transformer networks while outperforming both convolutional and transformer networks, achieving 95.89% precision, 93.77% recall, 94.8% F1-score, and 90.53% mIoU. Furthermore, this work introduces Crack-R dataset, which encompasses crack images captured at varying distances, facilitating the evaluation of segmentation accuracy in real-world scenarios with different crack-to-pixel ratios.

Full Text