Deep learning site classification model for automated photodocumentation in upper GI endoscopy (with video)

Upasana Agrawal,John B League,Shivaram P Arunachalam,Cadman L Leggett,Nayantara Coelho-Prabhu,Jeffrey R Fetzer,Priyadharshini Sivasubramaniam,Devanshi N Damani,Liang Yen Liu

doi:10.1016/j.igie.2023.01.002

Abstract

Background and AimsPhotodocumentation during EGD can be automated and standardized using deep learning (DL) models for anatomic site classification. EGD video data contain a significant number of suboptimal quality image frames for computational analysis (eg, off-center or blurry). We aimed to develop a DL model that extracts high-quality frames from EGD video data for anatomic classification.MethodsThe photodocumentation algorithm consisted of 2 image filters that extract high-quality image frames (appropriately centered, minimal to no blurriness) classified into 1 of 8 anatomic sites: esophagus, gastroesophageal junction, stomach body, fundus, angularis, antrum, duodenal bulb, and duodenum. Model training, testing, and internal validation were performed using 8231 EGD still images and 26,103 video-derived images with an even split among anatomic sites. Images were independently rated per category by 2 gastroenterologists. External validation was performed using an independent dataset of 2142 EGD still images. Model performance (accuracy, F1 score) for 5 EGD videos (6308 frames) was analyzed using a majority vote strategy across 5, 10, 20, and 30 consecutive frame windows.ResultsInternal testing and external validation for site classification showed overall accuracies of 98.1% and 95.0%, respectively; F1 scores ranged from 90.0% to 99.0% and 92.0% to 97.0% across anatomic sites, respectively. When applied to EGD video data, overall accuracies ranged from 89.7% to 94.8% across sampling window sizes.ConclusionsWe present a DL model capable of extracting high-quality frames from EGD video data and performing subsequent anatomic site classification with acceptable accuracy, allowing automated photodocumentation for consistent study quality and video indexing for annotated study review. Photodocumentation during EGD can be automated and standardized using deep learning (DL) models for anatomic site classification. EGD video data contain a significant number of suboptimal quality image frames for computational analysis (eg, off-center or blurry). We aimed to develop a DL model that extracts high-quality frames from EGD video data for anatomic classification. The photodocumentation algorithm consisted of 2 image filters that extract high-quality image frames (appropriately centered, minimal to no blurriness) classified into 1 of 8 anatomic sites: esophagus, gastroesophageal junction, stomach body, fundus, angularis, antrum, duodenal bulb, and duodenum. Model training, testing, and internal validation were performed using 8231 EGD still images and 26,103 video-derived images with an even split among anatomic sites. Images were independently rated per category by 2 gastroenterologists. External validation was performed using an independent dataset of 2142 EGD still images. Model performance (accuracy, F1 score) for 5 EGD videos (6308 frames) was analyzed using a majority vote strategy across 5, 10, 20, and 30 consecutive frame windows. Internal testing and external validation for site classification showed overall accuracies of 98.1% and 95.0%, respectively; F1 scores ranged from 90.0% to 99.0% and 92.0% to 97.0% across anatomic sites, respectively. When applied to EGD video data, overall accuracies ranged from 89.7% to 94.8% across sampling window sizes. We present a DL model capable of extracting high-quality frames from EGD video data and performing subsequent anatomic site classification with acceptable accuracy, allowing automated photodocumentation for consistent study quality and video indexing for annotated study review.

Full Text