Current 2D and 3D image-based crack detection methods in transportation infrastructure often struggle with noise robustness and feature diversity. To overcome these challenges, the paper use CSF-CrackNet, a self-adaptive 2D3D image fusion model utilizes channel and spatial modules for automated pavement crack segmentation. CSF-CrackNet consists of four parts: feature enhanced and field sensing (FEFS) module, channel module, spatial module, and semantic segmentation module. A multi-feature image dataset was established using a vehicle-mounted 3D imaging system, including color images, depth images, and color-depth overlapped images. Results show that the mean intersection over union (mIOU) of most models under the CSF-CrackNet framework can be increased to above 80 %. Compared with original RGB and depth images, the average mIOU increases with image fusion by 10 % and 5 %, respectively. The ablation experiment and weight significance analysis further demonstrate that CSF-CrackNet can significantly improve semantic segmentation performance by balancing information between 2D and 3D images.