For ultra-high-resolution (UHR) image semantic segmentation, striking a balance between computational efficiency and storage space is a crucial research direction. This paper proposes a Feature Fusion Network (EFFNet) to improve UHR image semantic segmentation performance. EFFNet designs a score map that can be embedded into the network for training purposes, enabling the selection of the most valuable features to reduce storage consumption, accelerate speed, and enhance accuracy. In the fusion stage, we improve upon previous redundant multiple feature fusion methods by utilizing a transformer structure for one-time fusion. Additionally, our combination of the transformer structure and multibranch structure allows it to be employed for feature fusion, significantly improving accuracy while ensuring calculations remain within an acceptable range. We evaluated EFFNet on the ISPRS two-dimensional semantic labeling Vaihingen and Potsdam datasets, demonstrating that its architecture offers an exceptionally effective solution with outstanding semantic segmentation precision and optimized inference speed. EFFNet substantially enhances critical performance metrics such as Intersection over Union (IoU), overall accuracy, and F1-score, highlighting its superiority as an architectural innovation in ultra-high-resolution remote sensing image semantic segmentation.
Read full abstract