Pattern-Aware Transformer: Hierarchical Pattern Propagation in Sequential Medical Images.

Lingyun Wu,Xiang Gao,Zhiqiang Hu,Shaoting Zhang

doi:10.1109/tmi.2023.3306468

Abstract

This paper investigates how to effectively mine contextual information among sequential images and jointly model them in medical imaging tasks. Different from state-of-the-art methods that model sequential correlations via point-wise token encoding, this paper develops a novel hierarchical pattern-aware tokenization strategy. It handles distinct visual patterns independently and hierarchically, which not only ensures the full flexibility of attention aggregation under different pattern representations but also preserves both local and global information simultaneously. Based on this strategy, we propose a Pattern-Aware Transformer (PATrans) featuring a global-local dual-path pattern-aware cross-attention mechanism to achieve hierarchical pattern matching and propagation among sequential images. Furthermore, PATrans is plug-and-play and can be seamlessly integrated into various backbone networks for diverse downstream sequence modeling tasks. We demonstrate its general application paradigm across four domains and five benchmarks in video object detection and 3D volumetric semantic segmentation tasks, respectively. Impressively, PATrans sets new state-of-the-art across all these benchmarks, i.e., CVC-Video (92.3% detection F1), ASU-Mayo (99.1% localization F1), Lung Tumor (78.59% DSC), Nasopharynx Tumor (75.50% DSC), and Kidney Tumor (87.53% DSC). Codes and models are available at https://github.com/GGaoxiang/PATrans.

Full Text