AbstractRecent years have witnessed notable advancements in medical image segmentation through deep convolutional neural networks. However, a notable limitation lies in the local operation of convolution, which hinders the ability to fully exploit global semantic information. To overcome the challenges prevalent in medical image segmentation, the feature ensemble network with multi‐scale atrous transformer is proposed. At the core of the approach lies the multi‐scale contextual integration module, which is based on the multi‐scale atrous transformer and facilitates contextual integration of multi‐level features. To extract discriminative fine‐grained features of the target region, a hybrid attention mechanism that synergistically combines spatial and channel attention, thereby sharpening the model's focus on crucial target information within high‐level features, is incorporated. Additionally, the channel‐aware feature reconstruction module is introduced as an innovative component engineered to tackle feature similarity issues across different categories. This module performs feature reconstruction based on channel perception, effectively widening the feature gap between categories and enhancing the segmentation capability. It is worth mentioning that our approach surpasses the state‐of‐the‐art method using three benchmark datasets in medical image segmentation.