Track defects are gradually emerging with the development of urban rail transits. However, there is rare research implemented to diagnose track conditions in real time. Although some intelligent data-driven methods seem to have the potential to achieve the track condition diagnosis, it’s hard to acquire sufficient labeled data in actual applications. This study proposes a dynamics simulation-assisted transfer learning (TL) method for label-scarce track condition diagnosis. Firstly, a dynamics model of axle-box bearings considering the service environment is established. Based on this model, a large amount of axle-box vibration signals corresponding to healthy/defective track conditions is simulated. Wavelet transform is performed for these signals to characterize their time-frequency energy distribution modes in the format of time-frequency maps, which are considered source-domain data. Similarly, the time-frequency maps of the collected signals during vehicle operation are served as the target-domain data. Subsequently, a sub-domain alignment TL network is constructed to map the data from the source and target domain into a deep feature space. In this network, unlabeled target-domain data are classified to obtain their pseudo labels. Finally, Wasserstein distance measure and multiple domain discriminators are employed to achieve label alignment between two domains for each corresponding category. A feature centroid-driven loss function is applied to further reduce the intra-class variations, ultimately realizing accurate knowledge transfer from simulated signals to collected signals. A two-level sliding window algorithm is designed to detect abnormal axle-box vibration signal parts which are then diagnosed through the well-trained network. The proposed method is validated through a transfer diagnosis experiment using simulated signals and collected signals. This study provides a promising solution to diagnose different track conditions, which is of great significance for ensuring running safety in urban rail transits.