An average pooling designed Transformer for robust crack segmentation

Zhaohui Chen,Elyas Asadi Shamsabadi,Sheng Jiang,Luming Shen,Daniel Dias-Da-Costa

doi:10.1016/j.autcon.2024.105367

Zhaohui Chen, Elyas Asadi Shamsabadi + Show 3 more

Open Access

https://doi.org/10.1016/j.autcon.2024.105367

Copy DOI

Journal: Automation in Construction	Publication Date: Mar 27, 2024
Citations: 1	License type: cc-by-nc-nd

Affiliation: University of Sydney, Hohai University

Abstract

Crack detection in civil infrastructures has seen impressive accuracy achieved by Convolutional Neural Networks (CNNs) and Transformers. However, practical deployments demand models that are not only highly accurate and robust but also efficient. This paper presents PoolingCrack, a novel and efficient Transformer-based model that leverages a hierarchical structure to capture local and global information in visual data, enabling accurate recovery of crack maps in various conditions. The encoder incorporates an average pooling design that enhances computational efficiency compared to traditional self-attention modules in Transformers, whereas the decoder deploys feature alignment, which improves the feature fusion accuracy. Asphalt, concrete, and masonry crack segmentation results show that the proposed model can reach 0.4% to 6.8% higher mDS than the representative models despite requiring 36–62% fewer parameters and achieving more robustness and effectiveness, with up to 52% higher mDS against noises and other artifacts.

Full Text