Introduction: The use of deep learning (DL) contouring algorithms to automatically calculate right ventricular ejection fraction (RVEF) is challenging due to the highly variable right ventricular (RV) geometry. The robustness of a DL algorithm to quantify RVEF using cardiac magnetic resonance (CMR) is highly dependent on the quality of the dataset used to train and cross-validate the algorithm. Using an insufficiently heterogeneous dataset to cross-validate a DL algorithm during development may poorly affect its performance when deployed in a clinical environment with diverse disease states and variable image quality. This study was designed to test the hypothesis that using a heterogeneous CMR dataset with a wide range of RV pathology during the cross-validation phase of DL algorithm development would result in more accurate quantification of RV function. Methods: We identified 100 CMR exams in which the RV function could not be accurately quantified using a commercially available DL algorithm (DL1), resulting in errors >5%. We then used a new algorithm that was developed while including images from a greater diversity of disease states and acquisition techniques during the cross-validation phase (DL2). RVEF was measured automatically using both DL1 and DL2 (Figure A) and compared to measurements made in a core laboratory (Core). Results: DL2-RVEF correlated highly with Core-RVEF (r=0.87), whereas DL1-RVEF correlated only modestly (r=0.42) (Figure B). DL2 also resulted in a smaller RVEF error than DL1 when compared to Core-RVEF (Figure C). For DL1, the percentages of smaller errors were considerably lower with no cases with <5% error, while errors >10% were noted in 35% of cases. In contrast, for DL2, 70% of cases were within 5% error, while only 10% of cases showed errors >10%. Conclusions: The use of a diverse dataset during the cross-validation phase of DL algorithm development resulted in a considerable improvement in the accuracy of the automated analysis of RVEF.