Summary Segmentation of high-resolution X-ray microcomputed tomography (µCT) images is crucial in digital rock physics (DRP), affecting the characterization and analysis of microscale phenomena in the porous media. The complexity of geological structures and nonideal scanning conditions pose significant challenges to conventional image segmentation approaches. Motivated by the recent increasing popularity of deep learning (DL) techniques in image processing, this work undertakes a comparative study of DL models, specifically U-Net and its variants, for segmenting multiple targets with distinguished features in digital rocks, including discrete fracture networks (DFNs), pore spaces, and solid rock. Particularly, DFNs have a smaller volumetric fraction over others, bringing in a substantial challenge of imbalanced segmentation. The primary focus is to evaluate the architecture and feature enhancement strategies of various DL models, including U-Net, attention U-Net, residual U-Net, U-Net++, and residual U-Net++. The models were designed as 2.5D, utilizing a central 2D image and its two adjacent upper and lower 2D images as input to provide a pseudo-3D context. In addition, because the ground truth of segmentation was unknown for real-world digital rocks, we created a benchmark data set following the inverse operations of segmentation. The data synthesis started from the label images (i.e., solid rock, pore spaces, and DFNs), followed by simulating partial volume blurring, adding random background noise, and introducing ring artifacts to mimic real raw X-ray µCT images. The data set, which included various rock types (i.e., sandstone and artificial data), scanning resolution, and magnitudes of noise and artifacts, was divided into training and testing data sets with a 90% and 10% ratio, respectively. Moreover, in addition to the conventional pixel-wise evaluation metrics, the physics-based metric of the lattice-Boltzmann method (LBM) simulated permeability provided more comprehensive assessments. The results demonstrated that the residual connections, nested architectures, and redesigned skip connections contribute to the model performance and give the residual U-Net++ the highest accuracy. The improvements were mainly on the boundaries and small targets, especially the DFNs, which dominate the interconnectivity and therefore affect the permeability greatly. This study also rigorously evaluated the efficiency and generalization of each model, demonstrating that the sophisticated architectures achieved excellent practicability and maintained robust performance on completely unseen data, ensuring their suitability for diverse and challenging DRP applications.