A controlled observer study was conducted to compare a method for automatic image segmentation with conventional user-guided segmentation of right and left kidneys from planning computerized tomographic (CT) images. Deformable shape models called m-reps were used to automatically segment right and left kidneys from 12 target CT images, and the results were compared with careful manual segmentations performed by two human experts. M-rep models were trained based on manual segmentations from a collection of images that did not include the targets. Segmentation using m-reps began with interactive initialization to position the kidney model over the target kidney in the image data. Fully automatic segmentation proceeded through two stages at successively smaller spatial scales. At the first stage, a global similarity transformation of the kidney model was computed to position the model closer to the target kidney. The similarity transformation was followed by large-scale deformations based on principal geodesic analysis (PGA). During the second stage, the medial atoms comprising the m-rep model were deformed one by one. This procedure was iterated until no changes were observed. The transformations and deformations at both stages were driven by optimizing an objective function with two terms. One term penalized the currently deformed m-rep by an amount proportional to its deviation from the mean m-rep derived from PGA of the training segmentations. The second term computed a model-to-image match term based on the goodness of match of the trained intensity template for the currently deformed m-rep with the corresponding intensity data in the target image. Human and m-rep segmentations were compared using quantitative metrics provided in a toolset called Valmet. Metrics reported in this article include (1) percent volume overlap; (2) mean surface distance between two segmentations; and (3) maximum surface separation (Hausdorff distance). Averaged over all kidneys the mean surface separation was 0.12 cm, the mean Hausdorff distance was 0.99 cm, and the mean volume overlap for human segmentations was 88.8%. Between human and m-rep segmentations the mean surface separation was 0.18-0.19 cm, the mean Hausdorff distance was 1.14-1.25 cm, and the mean volume overlap was 82-83%. Overall in this study, the best m-rep kidney segmentations were at least as good as careful manual slice-by-slice segmentations performed by two experienced humans, and the worst performance was no worse than typical segmentations from our clinical setting. The mean surface separations for human-m-rep segmentations were slightly larger than for human-human segmentations but still in the subvoxel range, and volume overlap and maximum surface separation were slightly better for human-human comparisons. These results were expected because of experimental factors that favored comparison of the human-human segmentations. In particular, m-rep agreement with humans appears to have been limited largely by fundamental differences between manual slice-by-slice and true three-dimensional segmentation, imaging artifacts, image voxel dimensions, and the use of an m-rep model that produced a smooth surface across the renal pelvis.