MResTNet: A Multi-Resolution Transformer Framework with CNN Extensions for Semantic Segmentation.

Nikolaos Detsikas,Ioannis Pratikakis,Nikolaos Mitianoudis

doi:10.3390/jimaging10060125

Abstract

A fundamental task in computer vision is the process of differentiation and identification of different objects or entities in a visual scene using semantic segmentation methods. The advancement of transformer networks has surpassed traditional convolutional neural network (CNN) architectures in terms of segmentation performance. The continuous pursuit of optimal performance, with respect to the popular evaluation metric results, has led to very large architectures that require a significant amount of computational power to operate, making them prohibitive for real-time applications, including autonomous driving. In this paper, we propose a model that leverages a visual transformer encoder with a parallel twin decoder, consisting of a visual transformer decoder and a CNN decoder with multi-resolution connections working in parallel. The two decoders are merged with the aid of two trainable CNN blocks, the fuser that combined the information from the two decoders and the scaler that scales the contribution of each decoder. The proposed model achieves state-of-the-art performance on the Cityscapes and ADE20K datasets, maintaining a low-complexity network that can be used in real-time applications.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

MResTNet: A Multi-Resolution Transformer Framework with CNN Extensions for Semantic Segmentation.

Abstract

Talk to us

Similar Papers

More From: Journal of imaging

Lead the way for us

Journal: Journal of imaging	Publication Date: May 21, 2024
License type: CC BY 4.0

Similar Papers

A semantic segmentation method integrated convolutional nonlinear spiking neural model with Transformer
Siyan Sun ... Zhicai Liu
Computer Vision and Image Understanding | VOL. 249
Siyan Sun, et. al.Siyan Sun ... Zhicai Liu
09 Oct 2024
Computer Vision and Image Understanding | VOL. 249

Multimodal biometric identification: leveraging convolutional neural network (CNN) architectures and fusion techniques with fingerprint and finger vein data
Amal Alshardan ... Yazeed Alzahrani
PeerJ Computer Science | VOL. 10
Amal Alshardan, et. al.Amal Alshardan ... Yazeed Alzahrani
31 Oct 2024
PeerJ Computer Science | VOL. 10

Texture Patterns for Object Recognition and Content-Based Color Image Retrieval

-

21 Dec 2020
21 Dec 2020

A Novel Lightweight Architecture of Deep Convolutional Neural Networks
Baichen Liu ... Huidi Jia
-
Baichen Liu, et. al.Baichen Liu ... Huidi Jia
27 Jul 2022
27 Jul 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MResTNet: A Multi-Resolution Transformer Framework with CNN Extensions for Semantic Segmentation.

Abstract

Talk to us

Similar Papers

More From: Journal of imaging