Rail surface defect detection using a transformer-based network

Feng Guo,Jian Liu,Yu Qian,Quanyi Xie

doi:10.1016/j.jii.2024.100584

Abstract

The detection of Rail Surface Defects (RSDs) plays a critical role in railway track maintenance. Traditional image processing methods exhibit limitations due to their intricate design and insufficient robustness, thereby restricting their broader applications. Recently, deep learning-based RSD detection methods have drawn great attention. However, these methods predominantly rely on Convolutional Neural Networks (CNN), neglecting the hierarchical linkages amongst disparate features, which impedes the refined portrayal of RSDs. To address these issues, we propose RailFormer, a novel system leveraging the capabilities of Transformer-based networks for the precise and efficient detection of RSDs. The encoder in RailFormer incorporates overlapped patch merging, efficient self-attention, and a Mix-feed Forward Network (FFN), all meticulously designed to bolster feature fusion from both global and local perspectives. Additionally, we have implemented a Criss-Cross attention module within the decoder to facilitate RSD detection and manage computational complexity. In this study, the proposed RailFormer and four other models including SegFormer, Swin Transformer, ViT, and UNet are trained and compared. We employ the widely used public RSD datasets RSDD, encompassing both Type-I and Type-II RSDD images and a customized RSD dataset, as a basis for performance comparison. The training outcomes and visualization results show that RailFormer achieves the highest mean Intersection over Union (mIoU) and superior visualization performance on the RSDD and the customized RSD datasets. These results demonstrate the superiority of RailFormer and underline its potential for future deployment in railway track inspection applications.

Full Text