Background and motivationsAntenatal or prenatal hydronephrosis (AHN) is a common kidney complication in unborn children. While AHN is generally benign and resolves over time, this condition can inflict serious kidney damage or even organ failure due to excessive waste accumulation in severe cases. Regardless of severity, AHN must be clinically monitored for resolution, with treatment plans and medications being revised according to updated prognoses. Kidney ultrasound images are one of the most common methods of monitoring AHN, but grading of this condition is highly subjective and clinicians may select inappropriate therapies or surgical interventions as a result. New approaches are required to differentiate subjects who can be managed without surgical intervention from those who require life-saving operations. MethodsAn end-to-end deep machine learning framework was developed to sequentially detect ultrasound regions of interest, segment kidneys from US images, and classify AHN severity. We propose the novel Kidney Ultrasound Segmentation Network (KUSNet) for kidney segmentation from ultrasound images, and the Prenatal Hydronephrosis Classification Network (PHCNet) for hydronephrosis severity stratification according to the Society of Fetal Urology (SFU) standards. The ground truth kidney masks were generated by two radiologists with more than five years of working experience while the SFU-based annotations for the AHN severity were done by two senior radiologists and three senior urologists with more than ten years of domain expertise. At each stage, the performance of the proposed models was assessed both quantitatively and qualitatively against state-of-the-art networks in the respective fields. ResultsThe proposed KUSNet for ultrasound kidney segmentation achieved 97.6% accuracy, 97.4% precision, 97.6% recall or sensitivity, 97.5% F1-score, 95.5% IoU or Jaccard score, and 92.1% Dice score, beating several state-of-the-arts. On the other hand, the novel PHCNet reached 93.9% accuracy, 93.7% precision, 93.9% recall, 93.8% specificity, and 89.0% F1-score subject-wise when performing multiclass stratification of AHN severity on segmented kidney regions. ConclusionArtificial intelligence-based tools can reliably classify AHN severity to reduce inter- and intra-observer bias, thereby aiding clinicians in the rapid selection of appropriate treatments and surgeries. Moreover, segmenting kidney regions beforehand significantly boosts AHN severity classification performance.