Objective. Current segmentation practice for thoracic cancer RT considers the whole heart as a single organ despite increased risks of cardiac toxicities from irradiation of specific cardiac substructures. Segmenting up to 15 different cardiac substructures can be a very time-intensive process, especially due to their different volume sizes and anatomical variations amongst different patients. In this work, a new deep learning (DL)-based mutual enhancing strategy is introduced for accurate and automatic segmentation, especially of smaller substructures such as coronary arteries. Approach. Our proposed method consists of three subnetworks: retina U-net, classification module, and segmentation module. Retina U-net is used as a backbone network architecture that aims to learn deep features from the whole heart. Whole heart feature maps from retina U-net are then transferred to four different sets of classification modules to generate classification localization maps of coronary arteries, great vessels, chambers of the heart, and valves of the heart. Each classification module is in sync with its corresponding subsequent segmentation module in a bootstrapping manner, allowing them to share their encoding paths to generate a mutual enhancing strategy. We evaluated our method on three different datasets: institutional CT datasets (55 subjects) 2) publicly available Multi-Modality Whole Heart Segmentation (MM-WHS) challenge datasets (120 subjects), and Automated Cardiac Diagnosis Challenge (ACDC) datasets (100 subjects). For institutional datasets, we performed five-fold cross-validation on training data (45 subjects) and performed inference on separate hold-out data (10 subjects). For each subject, 15 cardiac substructures were manually contoured by a resident physician and evaluated by an attending radiation oncologist. For the MM-WHS dataset, we trained the network on 100 datasets and performed an inference on a separate hold-out dataset with 20 subjects, each with 7 cardiac substructures. For ACDC datasets, we performed five-fold cross-validation on 100 datasets, each with 3 cardiac substructures. We compared the proposed method against four different network architectures: 3D U-net, mask R-CNN, mask scoring R-CNN, and proposed network without classification module. Segmentation accuracies were statistically compared through dice similarity coefficient, Jaccard, 95% Hausdorff distance, mean surface distance, root mean square distance, center of mass distance, and volume difference. Main results. The proposed method generated cardiac substructure segmentations with significantly higher accuracy (P < 0.05) for small substructures, especially for coronary arteries such as left anterior descending artery (CA-LADA) and right coronary artery (CA-RCA) in comparison to four competing methods. For large substructures (i.e. chambers of the heart), our method yielded comparable results to mask scoring R-CNN method, resulting in significantly (P < 0.05) improved segmentation accuracy in comparison to 3D U-net and mask R-CNN. Significance. A new DL-based mutual enhancing strategy was introduced for automatic segmentation of cardiac substructures. Overall results of this work demonstrate the ability of the proposed method to improve segmentation accuracies of smaller substructures such as coronary arteries without largely compromising the segmentation accuracies of larger substructures. Fast and accurate segmentations of up to 15 substructures can possibly be used as a tool to rapidly generate substructure segmentations followed by physicians’ reviews to improve clinical workflow.