In response to the challenges presented by the variable internal structures and complex components of multiphase composite building materials, this study introduces a novel image segmentation network named UT-FusionNet. Building on the U-Net model, UT-FusionNet integrates Transformer module and tensor concatenation and fusion mechanism to overcome the limitations of convolutional networks in relation to the receptive field. UT-FusionNet is employed to segment CT images of conventional concrete, grout consolidation bodies, and fiber-reinforced concrete. The results demonstrate that UT-FusionNet achieves superior segmentation accuracy and robustness, with Accuracy (ACC), Intersection over Union (IoU) and Dice scores exceeding 90 % across all subtasks. The mean accuracy metrics are 99.44 %, 96.70 %, and 98.31 %, respectively. This innovative end-to-end network offers robust support for detailed structural analysis, damage detection, and digital modeling of multiphase composite building materials through deep learning.