Background/Objectives: The aim of this study was to establish a histology-based gold standard for the evaluation of artificial intelligence (AI)-based caries detection systems on proximal surfaces in bitewing images. Methods: Extracted human teeth were used to simulate intraoral situations, including caries-free teeth, teeth with artificially created defects and teeth with natural proximal caries. All 153 simulations were radiographed from seven angles, resulting in 1071 in vitro bitewing images. Histological examination of the carious lesion depth was performed twice by an expert. A total of thirty examiners analyzed all the radiographs for caries. Results: We generated in vitro bitewing images to evaluate the performance of AI-based carious lesion detection against a histological gold standard. All examiners achieved a sensitivity of 0.565, a Matthews correlation coefficient (MCC) of 0.578 and an area under the curve (AUC) of 76.1. The histology receiver operating characteristic (ROC) curve significantly outperformed the examiners' ROC curve (p < 0.001). All examiners distinguished induced defects from true caries in 54.6% of cases and correctly classified 99.8% of all teeth. Expert caries classification of the histological images showed a high level of agreement (intraclass correlation coefficient (ICC) = 0.993). Examiner performance varied with caries depth (p ≤ 0.008), except between E2 and E1 lesions (p = 1), while central beam eccentricity, gender, occupation and experience had no significant influence (all p ≥ 0.411). Conclusions: This study successfully established an unbiased dataset to evaluate AI-based caries detection on bitewing surfaces and compare it to human judgement, providing a standardized assessment for fair comparison between AI technologies and helping dental professionals to select reliable diagnostic tools.
Read full abstract