Abstract

Crack detection in civil infrastructures has seen impressive accuracy achieved by Convolutional Neural Networks (CNNs) and Transformers. However, practical deployments demand models that are not only highly accurate and robust but also efficient. This paper presents PoolingCrack, a novel and efficient Transformer-based model that leverages a hierarchical structure to capture local and global information in visual data, enabling accurate recovery of crack maps in various conditions. The encoder incorporates an average pooling design that enhances computational efficiency compared to traditional self-attention modules in Transformers, whereas the decoder deploys feature alignment, which improves the feature fusion accuracy. Asphalt, concrete, and masonry crack segmentation results show that the proposed model can reach 0.4% to 6.8% higher mDS than the representative models despite requiring 36–62% fewer parameters and achieving more robustness and effectiveness, with up to 52% higher mDS against noises and other artifacts.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call