Abstract

Transformer has already proven its ability to model long-distance dependencies. However, medical images have strong local structures. Directly using Transformer to extract features would not only contain redundant information increasing the computational effort, but also be detrimental to extracting local details. Given these issues, we propose a network based on dynamic positioning and region-aware attention, which adopts a two-stage feature extraction strategy. In the shallow layer, we design Dynamic Positioning Attention (DPA). It will localize to the key feature information and construct a variable window for it, then perform attention calculation. DPA improves the learning ability of local details, reduces the amount of computation. At the deep level, Bi-Level Routing Attention (BRA) is used to discard irrelevant key–value pairs, achieve content-aware sparse attention for the deep dispersed semantic information, and improve computational efficiency. After several experiments, the results show that our method achieves advanced performance on different types of datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call