Objective. Celiac disease (CD) has emerged as a significant global public health concern, exhibiting an estimated worldwide prevalence of approximately 1%. However, existing research pertaining to domestic occurrences of CD is confined mainly to case reports and limited case analyses. Furthermore, there is a substantial population of undiagnosed patients in the Xinjiang region. This study endeavors to create a novel, high-performance, lightweight deep learning model utilizing endoscopic images from CD patients in Xinjiang as a dataset, with the intention of enhancing the accuracy of CD diagnosis. Approach. In this study, we propose a novel CNN-Transformer hybrid architecture for deep learning, tailored to the diagnosis of CD using endoscopic images. Within this architecture, a multi-scale spatial adaptive selective kernel convolution feature attention module demonstrates remarkable efficacy in diagnosing CD. Within this module, we dynamically capture salient features within the local channel feature map that correspond to distinct manifestations of endoscopic image lesions in the CD-affected areas such as the duodenal bulb, duodenal descending segment, and terminal ileum. This process serves to extract and fortify the spatial information specific to different lesions. This strategic approach facilitates not only the extraction of diverse lesion characteristics but also the attentive consideration of their spatial distribution. Additionally, we integrate the global representation of the feature map obtained from the Transformer with the locally extracted information via convolutional layers. This integration achieves a harmonious synergy that optimizes the diagnostic prowess of the model. Main results. Overall, the accuracy, specificity, F1-Score, and precision in the experimental results were 98.38%, 99.04%, 98.66% and 99.38%, respectively. Significance. This study introduces a deep learning network equipped with both global feature response and local feature extraction capabilities. This innovative architecture holds significant promise for the accurate diagnosis of CD by leveraging endoscopic images captured from diverse anatomical sites.