Abstract

Medical image segmentation is crucial for obtaining accurate diagnoses, and while convolutional neural network (CNN)-based methods have made strides in recent years, they struggle with modeling long-range dependencies. Transformer-based methods improve this task but require more computational resources. The segment anything model (SAM) can generate pixel-level segmentation results for natural images using sparse manual prompts, but it performs poorly on low-contrast, noisy ultrasound images. To address this issue, we propose a new medical image segmentation network architecture that integrates transformer components, CNN modules, and an SAM encoder into a unified framework. This allows us to simultaneously capture both long-range dependencies and local features. Additionally, we incorporate the image features extracted from the SAM model as prior knowledge to achieve further improved segmentation accuracy with limited training data. To reduce the imposed computational stress, we employ an axial attention mechanism to approximate a transformer's effects by expanding the receptive field. Instead of replacing the transformer components with lightweight attention modules, our model is divided into a global branch and a local branch. The global branch extracts context features with the transformer components, while the local branch processes patch tokens with the axial attention mechanism. We also construct an image pyramid to excavate internal statistics and multiscale representations to obtain more accurate segmentation regions. This bibranch pyramid transformer (Bi-BPT) architecture is effective and robust for medical image segmentation, surpassing other related segmentation network architectures. The experimental results obtained on various medical image datasets demonstrate its effectiveness.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call