P2AT: Pyramid pooling axial transformer for real-time semantic segmentation

Mohammed A.M Elhassan,Changjun Zhou,Amina Benabid,Abuzar B.M Adam

doi:10.1016/j.eswa.2024.124610

Abstract

Recently, Transformer-based models have achieved promising results in various vision tasks, due to their ability to model long-range dependencies. However, transformers are computationally expensive, which limits their applications in real-time tasks such as autonomous driving. In addition, efficient local and global feature selection and fusion are vital for accurate dense prediction, especially driving scene understanding tasks. In this paper, we propose a real-time semantic segmentation architecture named Pyramid Pooling Axial Transformer (P2AT). The proposed P2AT takes a coarse feature from the CNN encoder to produce scale-aware contextual features, which are then combined with the multi-level feature aggregation scheme to produce enhanced contextual features. Specifically, we introduce a pyramid pooling axial transformer to capture intricate spatial and channel dependencies, leading to improved performance on semantic segmentation. Then, we design a Bidirectional Fusion module (BiF) to combine semantic information at different levels. Meanwhile, a Global Context Enhancer (GCE) is introduced to compensate for the inadequacy of concatenating different semantic levels. Finally, a decoder block is proposed to help maintain a larger receptive field. We evaluate P2AT variants on three challenging scene-understanding datasets. In particular, our P2AT variants achieve state-of-art results on the Camvid dataset 80.5%, 81.0%, 81.1% for P2AT-S, P2AT-M, and P2AT-L, respectively. Furthermore, our experiments on Cityscapes and Pascal VOC 2012 have demonstrated the efficiency of the proposed architecture. The source code will be available at https://github.com/mohamedac29/P2AT.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

P2AT: Pyramid pooling axial transformer for real-time semantic segmentation

Abstract

Talk to us

Similar Papers

More From: Expert Systems With Applications

Lead the way for us

Journal: Expert Systems With Applications	Publication Date: Jun 28, 2024
Citations: 1

Similar Papers

Global and Local Multi-scale Feature Fusion for Object Detection and Semantic Segmentation
Young-Chul Lim ... Minsung Kang
-
Young-Chul Lim, et. al.Young-Chul Lim ... Minsung Kang
01 Jun 2019
01 Jun 2019

Local-to-global semi-supervised feature selection
Mohammed Hindawi ... Khalid Benabdeslem
-
Mohammed Hindawi, et. al.Mohammed Hindawi ... Khalid Benabdeslem
01 Jan 2013
01 Jan 2013

Two-branch encoding and iterative attention decoding network for semantic segmentation
Hegui Zhu ... Min Zhang
Neural Computing and Applications | VOL. 33
Hegui Zhu, et. al.Hegui Zhu ... Min Zhang
01 Sep 2020
Neural Computing and Applications | VOL. 33

Chapter 12 - Semantic scene segmentation for robotics
Juana Valeria Hurtado ... Abhinav Valada
Deep Learning for Robot Perception and Cognition | VOL. -
Juana Valeria Hurtado, et. al.Juana Valeria Hurtado ... Abhinav Valada
01 Jan 2021
Deep Learning for Robot Perception and Cognition | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

P2AT: Pyramid pooling axial transformer for real-time semantic segmentation

Abstract

Talk to us

Similar Papers

More From: Expert Systems With Applications