BackgroundConstruct deep learning models for colonoscopy quality control using different architectures and explore their decision-making mechanisms.MethodsA total of 4,189 colonoscopy images were collected from two medical centers, covering different levels of bowel cleanliness, the presence of polyps, and the cecum. Using these data, eight pre-trained models based on CNN and Transformer architectures underwent transfer learning and fine-tuning. The models’ performance was evaluated using metrics such as AUC, Precision, and F1 score. Perceptual hash functions were employed to detect image changes, enabling real-time monitoring of colonoscopy withdrawal speed. Model interpretability was analyzed using techniques such as Grad-CAM and SHAP. Finally, the best-performing model was converted to ONNX format and deployed on device terminals.ResultsThe EfficientNetB2 model outperformed other architectures on the validation set, achieving an accuracy of 0.992. It surpassed models based on other CNN and Transformer architectures. The model’s precision, recall, and F1 score were 0.991, 0.989, and 0.990, respectively. On the test set, the EfficientNetB2 model achieved an average AUC of 0.996, with a precision of 0.948 and a recall of 0.952. Interpretability analysis showed the specific image regions the model used for decision-making. The model was converted to ONNX format and deployed on device terminals, achieving an average inference speed of over 60 frames per second.ConclusionsThe AI-assisted quality system, based on the EfficientNetB2 model, integrates four key quality control indicators for colonoscopy. This integration enables medical institutions to comprehensively manage and enhance these indicators using a single model, showcasing promising potential for clinical applications.