Cityscapes Dataset Research Articles

Transformer-based instance-level recognition has attracted increasing research attention recently due to the superior performance. However, although attempts have been made to encode masks as embeddings into Transformer-based frameworks, how to combine mask embeddings and spatial information for a transformer-based approach is still not fully explored. In this paper, we revisit the design of mask-embedding-based pipelines and propose an Instance Segmentation TRansformer (ISTR) with Mask Meta-Embeddings (MME), leveraging the strengths of transformer models in encoding embedding information and incorporating spatial information from mask embeddings. ISTR incorporates a recurrent refining head that consists of a Dynamic Box Predictor (DBP), a Mask Information Generator (MIG), and a Mask Meta-Decoder (MMD). To improve the quality of mask embeddings, MME interprets the mask encoding-decoding processes as a mutual information maximization problem, which unifies the objective functions of different decoding schemes such as Principal Component Analysis (PCA) and Discrete Cosine Transform (DCT) with a meta-formulation. Under the meta-formulation, a learnable Spatial Mask Tuner (SMT) is further proposed, which fuses the spatial and embedding information produced from MIG and can significantly boost the segmentation performance. The resulting varieties, i.e., ISTR-PCA, ISTR-DCT, and ISTR-SMT, demonstrate the effectiveness and efficiency of incorporating mask embeddings with the query-based instance segmentation pipelines. On the COCO dataset, ISTR surpasses all predominant mask-embedding-based models by a large margin, and achieves competitive performance compared to concurrent state-of-the-art models. On the Cityscapes dataset, ISTR also outperforms several strong baselines. Our code has been made available at: https://github.com/hujiecpp/ISTR.

Read full abstract

In this paper, we propose a method to improve prediction accuracy of semantic segmentation methods as follows: (1) construct a neural network that has pre-processing layers based on a convolutional autoencoder ahead of a semantic segmentation network, and (2) train the entire network initialized by the weights of the pre-trained autoencoder. We applied this method to the fully convolutional network (FCN) and experimentally compared its prediction accuracy on the cityscapes dataset. The Mean IoU of the proposed target model with the He normal initialization is 18.7% higher than that of FCN with the He normal initialization. In addition, those of the modified models of the target model are significantly higher than that of FCN with the He normal initialization. The accuracy and loss curves during the training showed that these are resulting from the improvement of the generalization ability. All of these results provide strong evidence that the proposed method is significantly effective in improving the prediction accuracy of FCN. The proposed method has the following features: it is comparatively simple, whereas the effect on improving the generalization ability and prediction accuracy of FCN is significant; the increase in the number of parameters by using it is very small, and that in the computation time is substantially large. In principle, the proposed method can be applied to other semantic segmentation methods. For semantic segmentation, at present, there is no effective way to improve the prediction accuracy of existing methods. None have published a method which is the same as or similar to our method and none have used such a method in practice. Therefore, we believe that our method is useful in practice and worthy of being widely known and used.

Read full abstract

Cityscapes Dataset Research Articles

Related Topics

Articles published on Cityscapes Dataset

Semantic segmentation based on enhanced gated pyramid network with lightweight attention module

Drivable Area Detection: a Comparative Study of Algorithms Based on Deep Learning

Real-time semantic segmentation via mutual optimization of spatial details and semantic information

A feature enhancement FCOS algorithm for dynamic traffic object detection

AM‐MulFSNet: A fast semantic segmentation network combining attention mechanism and multi‐branch

FLODCAST: Flow and depth forecasting via multimodal recurrent architectures

Explainer on GNN-based segmentation networks

Exploring the spatial attributes of streets in Lu Xun’s hometown of Shaoxing, China, through image semantic segmentation

ISTR: Mask-Embedding-Based Instance Segmentation Transformer.

Stereo Superpixel Segmentation via Decoupled Dynamic Spatial-Embedding Fusion Network

Improving Prediction Accuracy of Semantic Segmentation Methods Using Convolutional Autoencoder Based Pre-processing Layers

Fully Convolutional Network-Based Self-Supervised Learning for Semantic Segmentation.

Real-Time Semantic Segmentation via Spatial-Detail Guided Context Propagation.

Boosting Neural Image Compression for Machines Using Latent Space Masking

IterDepth: Iterative Residual Refinement for Outdoor Self-Supervised Multi-Frame Monocular Depth Estimation

Binocular Image Dehazing via a Plain Network Without Disparity Estimation

Quantifying morphologies of developing neuronal cells using deep learning with imperfect annotations

Image Recognition Technology Applied to the Design of Mobile Platform for Warehouse Logistics Robots

MFSNet: Enhancing Semantic Segmentation of Urban Scenes with a Multi-Scale Feature Shuffle Network

Dynamic context modeling based lightweight high-resolution network for dense prediction

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Cityscapes Dataset Research Articles

Related Topics

Articles published on Cityscapes Dataset

Semantic segmentation based on enhanced gated pyramid network with lightweight attention module

Drivable Area Detection: a Comparative Study of Algorithms Based on Deep Learning

Real-time semantic segmentation via mutual optimization of spatial details and semantic information

A feature enhancement FCOS algorithm for dynamic traffic object detection

AM‐MulFSNet: A fast semantic segmentation network combining attention mechanism and multi‐branch

FLODCAST: Flow and depth forecasting via multimodal recurrent architectures

Explainer on GNN-based segmentation networks

Exploring the spatial attributes of streets in Lu Xun’s hometown of Shaoxing, China, through image semantic segmentation

ISTR: Mask-Embedding-Based Instance Segmentation Transformer.

Stereo Superpixel Segmentation via Decoupled Dynamic Spatial-Embedding Fusion Network

Improving Prediction Accuracy of Semantic Segmentation Methods Using Convolutional Autoencoder Based Pre-processing Layers

Fully Convolutional Network-Based Self-Supervised Learning for Semantic Segmentation.

Real-Time Semantic Segmentation via Spatial-Detail Guided Context Propagation.

Boosting Neural Image Compression for Machines Using Latent Space Masking

IterDepth: Iterative Residual Refinement for Outdoor Self-Supervised Multi-Frame Monocular Depth Estimation

Binocular Image Dehazing via a Plain Network Without Disparity Estimation

Quantifying morphologies of developing neuronal cells using deep learning with imperfect annotations

Image Recognition Technology Applied to the Design of Mobile Platform for Warehouse Logistics Robots

MFSNet: Enhancing Semantic Segmentation of Urban Scenes with a Multi-Scale Feature Shuffle Network

Dynamic context modeling based lightweight high-resolution network for dense prediction