Abstract

Learning multi-scale feature representations is essential for medical image segmentation. Most existing frameworks are based on U-shape architecture in which the high-resolution representation is recovered progressively by connecting different levels of the decoder with the low-resolution representation from the encoder. However, intrinsic defects in complementary feature fusion inhibit the U-shape from aggregating efficient global and discriminative features along object boundaries. While Transformer can help model the global features, their computation complexity limits the application in real-time medical scenarios. To address these issues, we propose a Cross-scale Fusion Network (CFNet), combining a cross-scale attention module and pyramidal module to fuse multi-stage/global context information. Specifically, we first utilize large kernel convolution to design the basic building block capable of extracting global and local information. Then, we propose a Bidirectional Atrous Spatial Pyramid Pooling (BiASPP), which employs atrous convolution in the bidirectional paths to capture various shapes and sizes of brain tumors. Furthermore, we introduce a cross-stage attention mechanism to reduce redundant information when merging features from two stages with different semantics. Extensive evaluation was performed on five medical image segmentation datasets: a 3D volumetric dataset, namely Brats benchmarks. CFNet-L achieves 85.74% and 90.98% dice score for Enhanced Tumor and Whole Tumor on Brats2018, respectively. Furthermore, our largest model CFNet-L outperformed other methods on 2D medical image. It achieved 71.95%, 82.79%, and 80.79% SE for STARE, DRIVE, and CHASEDB1, respectively. The code will be available at https://github.com/aminabenabid/CFNet

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call