ABSTRACT Foreign objects invading high-speed railway lines can cause danger. One existing solution is to use remote sensing images to analyse the dangerous areas along the railway line, thereby providing a certain amount of investigation time. Considering the spatial and temporal resolution characteristics of existing remote sensing technologies in identifying floating objects and the reality of rapid land use changes, this paper identifies areas on the ground where floating objects may be generated by using semantic segmentation techniques oriented to remotely sensed imagery and provides early warnings to staff along the route. However, these regions that need to be analysed have different semantics and scales. To address these challenges, this paper proposes a Dual-branch Parallel Fusion Network (DPFNet) based on Transformer, aimed at enhancing multi-class semantic segmentation in remote sensing images. To leverage global contextual information, we introduce a Swin Transformer-based backbone network, which employs self-attention to capture a comprehensive scene context, facilitating better segmentation by considering the entire scene’s context. For multi-scale semantic features, we propose one approach that involves independent branching feature expression and a Multi-scale Feature Space Fusion Module (MFSFM). The former enriches multi-scale information, while the latter fuses features across different levels to capture diverse semantic features. Experimental results demonstrate that DPFNet can effectively identify the hidden danger area, and the fusion of multi-scale features makes the network more accurately identify and segment the risk area of different sizes, improving the segmentation accuracy and robustness, and is of great significance to the formation of the ‘prevention’ as the core of the railway safety operation.