High-Speed and Accurate Diagnosis of Gastrointestinal Disease: Learning on Endoscopy Images Using Lightweight Transformer with Local Feature Attention.

Shibin Wu,Qicai Liu,Ruxin Zhang,Jiayi Yan,Liyang Wang,Chengquan Li,Haoqian Wang

doi:10.3390/bioengineering10121416

Shibin Wu, Qicai Liu + Show 5 more

Open Access

https://doi.org/10.3390/bioengineering10121416

Copy DOI

Abstract

In response to the pressing need for robust disease diagnosis from gastrointestinal tract (GIT) endoscopic images, we proposed FLATer, a fast, lightweight, and highly accurate transformer-based model. FLATer consists of a residual block, a vision transformer module, and a spatial attention block, which concurrently focuses on local features and global attention. It can leverage the capabilities of both convolutional neural networks (CNNs) and vision transformers (ViT). We decomposed the classification of endoscopic images into two subtasks: a binary classification to discern between normal and pathological images and a further multi-class classification to categorize images into specific diseases, namely ulcerative colitis, polyps, and esophagitis. FLATer has exhibited exceptional prowess in these tasks, achieving 96.4% accuracy in binary classification and 99.7% accuracy in ternary classification, surpassing most existing models. Notably, FLATer could maintain impressive performance when trained from scratch, underscoring its robustness. In addition to the high precision, FLATer boasted remarkable efficiency, reaching a notable throughput of 16.4k images per second, which positions FLATer as a compelling candidate for rapid disease identification in clinical practice.

Full Text