Abstract Introduction Cardiac computed tomography (CCT) images provide vast spatial and temporal information. Segmentation of the structure of interest is necessary to provide quantitative information. However, manual segmentation of cardiac substructures is highly time-consuming. Objective This study aims to develop a robust deep-learning segmentation model for cardiac substructure segmentation in 3D CCT images. Method We initially developed a 3D U-Net architecture, incorporating extensive augmentation, for the segmentation of the Left Ventricle Myocardium (LVM), the blood cavities of the Left and Right Ventricles (LV, RV), and the Left and Right Atria (LA, RA). This model utilized CT images from 20 patients who had available manual segmentation. Subsequently, we applied this initial model to datasets from two centers, including different generations of scanners. We then visually inspected the segmentation, selected the highest-quality segmentations, and modify it if necessary, and excluded the low-quality data. We continued fine-tuning the model with the new dataset until we reached a count of 456 patients. As there were multiple images from one patient during this model's development, we generated datasets to develop a new model using the entire new dataset to avoid any data leakage in the reported metrics. The model was developed using data from Center 1 (5-fold cross-validation) and then evaluated on an external test set from Center 2 (126 patients) and also 4D images from Center 1. Various metrics were calculated to assess the model's performance. Results In the internal test set, utilizing 5-fold CV, the model demonstrated high segmentation accuracy, achieving an overall Dice coefficient of 0.98, with individual scores of 0.96 for LVM, 0.98 for LV, 0.98 for RV, 0.99 for RA, and 0.98 for LA. We assessed the automatic quantification capability of the segmentation model from 256 patients, resulting in an R2 value greater than 0.92 for both LV end-diastolic and systolic volumes. In contrast, the model maintained robust performance on the external dataset, achieving an overall Dice coefficient of 0.96, with individual scores of 0.95 for LVM, 0.98 for LV, 0.97 for RV, 0.98 for RA, and 0.94 for LA. Conclusion We provided a large cohort of ground truth segmentations for different cardiac CT substructures using a human-in-the-loop strategy. Subsequently, we developed a deep learning model for highly accurate automated segmentation of these substructures. This model allows for fast and fully automatic segmentation and quantification of 3D and 4D cardiac CT images, from which various quantitative metrics can be extracted from different cardiac substructures.