Abstract

Pixel-level crack segmentation remains a challenging task due to the trade-off between computational cost and accuracy, as well as the small size of real-world cracks, typically submillimeter in width, resulting in limited pixels for analysis. To address these challenges, this paper proposes a Pixel Crack Transformer Network (PCTNet) to investigate the impact of different camera placements on network performance. PCTNet adopts a hierarchical structure with Cross-Scale PatchEmbedding Layer and Dual Attention Transformer Block, enabling the generation of multi-scale feature maps and the fusion of global and local features. PCTNet achieves a reduction of up to 64% in computational cost compared to transformer networks while outperforming both convolutional and transformer networks, achieving 95.89% precision, 93.77% recall, 94.8% F1-score, and 90.53% mIoU. Furthermore, this work introduces Crack-R dataset, which encompasses crack images captured at varying distances, facilitating the evaluation of segmentation accuracy in real-world scenarios with different crack-to-pixel ratios.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call