Detection of rail surface defects using deep learning largely relies on supervision with pixel-level ground truths, where the detection accuracy suffers due to unseen image domains. In order to streamline the laborious labeling process and accurately identify defects on rail surfaces a U-Net-based network that integrates spatial and channel attention (SA) modules, multistep domain adaptation (MSDA), and progressive histogram matching (PHM) is developed in the context of pixel-level defects detection in this research. MSDA with an unlabeled intermediate dataset is used to reduce the domain gap underlying different domains. To further improve the adaptation of the transferred model, the PHM method and the self-training method are employed to enhance detection accuracy. Trained in the source domain with ground truths, the deep learning model is then tested in the target domain with no annotations. Experimental results on the RSDD dataset and the Rail-Joint dataset demonstrate the applicability and effectiveness of the proposed approach. By alleviating the domain discrepancy, the developed model drastically enhances the average precision (AP) score of unsupervised detection from 0.103 to 0.861, which shows competitive performance compared with the AP score of 0.895 by the state-of-the-art supervised models. The novelty of this research lies in the development of a hybrid deep learning approach that is able to achieve competitive detection accuracy with a low overlap ratio between domains and few data labeling.