Abstract

Automatic breast lesion segmentation in ultrasound (US) videos is an essential prerequisite for early diagnosis and treatment. This challenging task remains under-explored due to the lack of availability of annotated US video dataset. Though recent works have achieved better performance in natural video object segmentation by introducing promising Transformer architectures, they still suffer from spatial inconsistency as well as huge computational costs. Therefore, in this paper, we first present a new benchmark dataset designed for US video segmentation. Then, we propose a dynamic parallel spatial-temporal Transformer (DPSTT) to improve the performance of lesion segmentation in US videos with higher computational efficiency. Specifically, the proposed DPSTT disentangles the non-local Transformer along the temporal and spatial dimensions, respectively. The temporal Transformer attends temporal lesion movement on different frames at the same regions, and the spatial Transformer focuses on similar context information between the previous and the current frames. Furthermore, we propose a dynamic selection scheme to effectively sample the most relevant frames from all the past frames, and thus prevent out of memory during inference. Finally, we conduct extensive experiments to evaluate the efficacy of the proposed DPSTT on the new US video benchmark dataset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call