To develop a two-stage three-dimensional (3D) convolutional neural networks (CNNs) for fully automated volumetric segmentation of pancreas on computed tomography (CT) and to further evaluate its performance in the context of intra-reader and inter-reader reliability at full dose and reduced radiation dose CTs on a public dataset. A dataset of 1994 abdomen CT scans (portal venous phase, slice thickness≤3.75-mm, multiple CT vendors) was curated by two radiologists (R1 and R2) to exclude cases with pancreatic pathology, suboptimal image quality, and image artifacts (n=77). Remaining 1917 CTs were equally allocated between R1 and R2 for volumetric pancreas segmentation [ground truth (GT)]. This internal dataset was randomly divided into training (n=1380), validation (n=248), and test (n=289) sets for the development of a two-stage 3D CNN model based on a modified U-net architecture for automated volumetric pancreas segmentation. Model's performance for pancreas segmentation and the differences in model-predicted pancreatic volumes vs GT volumes were compared on the test set. Subsequently, an external dataset from The Cancer Imaging Archive (TCIA) that had CT scans acquired at standard radiation dose and same scans reconstructed at a simulated 25% radiation dose was curated (n=41). Volumetric pancreas segmentation was done on this TCIA dataset by R1 and R2 independently on the full dose and then at the reduced radiation dose CT images. Intra-reader and inter-reader reliability, model's segmentation performance, and reliability between model-predicted pancreatic volumes at full vs reduced dose were measured. Finally, model's performance was tested on the benchmarking National Institute of Health (NIH)-Pancreas CT (PCT) dataset. Three-dimensional CNN had mean (SD) Dice similarity coefficient (DSC): 0.91 (0.03) and average Hausdorff distance of 0.15 (0.09)mm on the test set. Model's performance was equivalent between males and females (P=0.08) and across different CT slice thicknesses (P>0.05) based on noninferiority statistical testing. There was no difference in model-predicted and GT pancreatic volumes [mean predicted volume 99cc (31cc); GT volume 101cc (33cc), P=0.33]. Mean pancreatic volume difference was -2.7cc (percent difference: -2.4% of GT volume) with excellent correlation between model-predicted and GT volumes [concordance correlation coefficient (CCC)=0.97]. In the external TCIA dataset, the model had higher reliability than R1 and R2 on full vs reduced dose CT scans [model mean (SD) DSC: 0.96 (0.02), CCC=0.995 vs R1 DSC: 0.83 (0.07), CCC=0.89, and R2 DSC:0.87 (0.04), CCC=0.97]. The DSC and volume concordance correlations for R1 vs R2 (inter-reader reliability) were 0.85 (0.07), CCC=0.90 at full dose and 0.83 (0.07), CCC=0.96 at reduced dose datasets. There was good reliability between model and R1 at both full and reduced dose CT [full dose: DSC: 0.81 (0.07), CCC=0.83 and reduced dose DSC:0.81 (0.08), CCC=0.87]. Likewise, there was good reliability between model and R2 at both full and reduced dose CT [full dose: DSC: 0.84 (0.05), CCC=0.89 and reduced dose DSC:0.83(0.06), CCC=0.89]. There was no difference in model-predicted and GT pancreatic volume in TCIA dataset (mean predicted volume 96cc (33); GT pancreatic volume 89cc (30), p=0.31). Model had mean (SD) DSC: 0.89 (0.04) (minimum-maximum DSC: 0.79 -0.96) on the NIH-PCT dataset. A 3D CNN developed on the largest dataset of CTs is accurate for fully automated volumetric pancreas segmentation and is generalizable across a wide range of CT slice thicknesses, radiation dose, and patient gender. This 3D CNN offers a scalable tool to leverage biomarkers from pancreas morphometrics and radiomics for pancreatic diseases including for early pancreatic cancer detection.
Read full abstract