Introduction: Deep-learning (DL)-based applications for Intracerebral Hemorrhage (ICH) detection on non-contrast computed tomography (NCCT) scans have demonstrated the potential to enhance diagnostic accuracy and efficiency amid the increasing radiologist workload. However, the sensitivity and specificity of these applications remain suboptimal, particularly in cases with subtle ICH (ICH volume < 5ml) and when confounding factors are present. This study aimed to enhance DL-based application, reduce false positives and improve the detection of subtle ICH. Methods: This study compared two versions of the DL-based application for ICH detection (CINA-ICH, Avicenna.AI, La Ciotat, France), both using a hybrid 2D/3D architecture. The first version, CINA-ICH, was trained on 8,994 representative CT scans (1,034 ICH+) from a cohort diverse in patient characteristics and acquisition parameters. The improved version, CINA-ICH(i), was trained on the same dataset enriched with 600 challenging cases (several confounding factors present) and included a specialised 3D network for subtle ICH detection, that underwent independent training on 2,238 CT-scans including 399 subtle ICH. The evaluation dataset included 479 NCCT scans (131 ICH+ including 24 subtle ICH) from over 200 U.S. hospitals, 4 scanner makers and 35 scanner models. Ground truth was determined by consensus among three board-certified radiologists. Sensitivity, Specificity, Accuracy and Matthews Correlation Coefficient (MCC) of CINA-ICH and CINA-ICH(i) were evaluated with a detailed analysis of false positive cases and subtle ICHs. Results: CINA-ICH(i) demonstrated a statistically significant (p<0.05) improvement in diagnostic performance compared to CINA-ICH: 88.5% vs. 83.2% for sensitivity, 94% vs. 85.6% for specificity, and 92.5% vs. 85% for accuracy. MCC for CINA-ICH(i) was improved from 0.65 to 0.81. CINA-ICH(i) reduced false positives from 50 to 21, effectively minimizing the detection of spurious findings and anatomical structures such as falx cerebri and sinuses. Additionally, CINA-ICH(i) enhanced the recognition of subtle ICH, detecting 37.5% of cases compared to only 8.3% detected by CINA-ICH. Conclusions: The implemented strategy to enhance the DL-based algorithm successfully increased ICH detection accuracy. This advancement demonstrates the potential for optimized algorithms to better support clinical decision-making by reducing false positives and improving the detection of subtle ICH cases.
Read full abstract