Studies comparing the detection of clean mucosal areas in capsule endoscopy (CE) using human judgment versus artificial intelligence (AI) are rare. This study statistically analyzed gastroenterologist judgments and AI results. Three hundred CE video clips (100 patients) were prepared. Five gastroenterologists classified the video clips into 3 groups (≥75% [high], 50%-75% [middle], and < 50% [low]) according to their subjective judgment of cleanliness. Visualization scores were calculated using an AI algorithm based on the predicted visible area, and the 5 gastroenterologists' judgments and AI results were compared. The 5 gastroenterologists evaluated CE clip video quality as "high" in 10.7% to 36.7% and as "low" in 28.7% to 60.3% and 29.7% of cases, respectively. The AI evaluated CE clip video quality as "high" in 27.7% and as "low" in 29.7% of cases. Repeated-measures analysis of variance (ANOVA) revealed significant differences in the 6 evaluation indicators (5 gastroenterologists and 1 AI) (P < .001). Among the 300 judgments, 90 (30%) were consistent with 5 gastroenterologists' judgments, and 82 (91.1%) agreed with the AI judgments. The "high" and "low" judgments of the gastroenterologists and AI agreed in 95.0% and 94.9% of cases, respectively. Bonferroni's multiple comparison test showed no significant difference between 3 gastroenterologists and AI (P = .0961, P = 1.0000, and P = .0676, respectively) but a significant difference between the other 2 with AI (P < .0001). When evaluating CE images for cleanliness, the judgments of 5 gastroenterologists were relatively diverse. The AI produced a relatively universal judgment that was consistent with the gastroenterologists' judgements.
Read full abstract