Decoder Module Research Articles

Scene text recognition (STR) is designed to automatically recognize the text content in natural scenes. Different from regular document text, text in natural scenes has the characteristics of irregular shapes, complex background, and distorted and blurred contents, which makes STR challenging. To solve the problems of STR for distorted, blurred, and low-resolution texts in natural scenes, this paper proposes a HRNet encoder and dual-branch decoder framework-based STR model. The model mainly consists of an encoder module and a dual-branch decoder module composed of a super-resolution branch and a recognition branch in parallel. In the encoder module, the HRNet is adopted to realize the cross-parallel aggregation representation with multiple resolutions during feature extraction and then outputs four kinds of feature maps with different resolutions. Moreover, the supervised attention module is used to strengthen the learning of the important feature information. In the decoder module, the dual-branch structure is adopted, in which the super-resolution branch takes the feature maps with the highest resolution obtained in the encoder module as input and restores images by upsampling through transposed convolution. The four kinds of feature maps with different resolutions are fused through independent transposed convolution layers for multiscale fusion in the recognition branch and then inputted into the attention-based decoder for text recognition. To improve the accuracy of text recognition, the feature extraction effect of the encoder module is together supervised by the super-resolution branch loss and the recognition branch loss. In addition, the super-resolution branch is only used for training and is abandoned during testing to reduce the complexity of the model. The proposed model is trained on Synth90K and SynthText datasets and tested on seven natural scene datasets. Compared with classical models such as ASTER, TextSR, and SCGAN, the recognition accuracy of the proposed model is improved and better recognition results can be achieved on irregular and blurred datasets such as IC15, SVTP, and CUTE80.

Read full abstract

Traffic target tracking is a core task in intelligent transportation system because it is useful for scene understanding and vehicle autonomous driving. Most state-of-the-art (SOTA) multiple object tracking (MOT) methods adopt a two-step procedure: object detection followed by data association. The object detection has made great progress with the development of deep learning. However, the data association still heavily depends on hand crafted constraints, such as appearance, shape, and motion, which need to be elaborately trained for a special object. In this study, a spatial-temporal encoder-decoder affinity network is proposed for multiple traffic targets tracking, aiming to utilize the power of deep learning to learn a robust spatial-temporal affinity feature of the detections and tracklets for data association. The proposed spatial-temporal affinity network contains a two-stage transformer encoder module to encode the features of the detections and the tracked targets at the image level and the tracklet level, aiming to capture the spatial correlation and temporal history information. Then, a spatial transformer decoder module is designed to compute the association affinity, where the results from the two-stage transformer encoder module are fed back to fully capture and encode the spatial and temporal information from the detections and the tracklets of the tracked targets. Thus, efficient affinity computation can be applied to perform data association in online tracking. To validate the effectiveness of the proposed method, three popular multiple traffic target tracking datasets, KITTI, UA-DETRAC, and VisDrone, are used for evaluation. On the KITTI dataset, the proposed method is compared with 15 SOTA methods and achieves 86.9% multiple object tracking accuracy (MOTA) and 85.71% multiple object tracking precision (MOTP). On the UA-DETRAC dataset, 12 SOTA methods are used to compare with the proposed method, and the proposed method achieves 20.82% MOTA and 35.65% MOTP, respectively. On the VisDrone dataset, the proposed method is compared with 10 SOTA trackers and achieves 40.5% MOTA and 74.1% MOTP, respectively. All those experimental results show that the proposed method is competitive to the state-of-the-art methods by obtaining superior tracking performance.

Read full abstract

Decoder Module Research Articles

Articles published on Decoder Module

MID: A Novel Mountainous Remote Sensing Imagery Registration Dataset Assessed by a Coarse-to-Fine Unsupervised Cascading Network

TransText: Improving scene text detection via transformer

AlgaeMask: An Instance Segmentation Network for Floating Algae Detection

PCUNet: A Context-Aware Deep Network for Coarse-to-Fine Point Cloud Completion

Parameter Identification in Power Transmission Systems Based on Graph Convolution Network

Semantic segmentation of bone structures in chest X-rays including unhealthy radiographs: A robust and accurate approach

Progressive multi-branch embedding fusion network for underwater image enhancement

Transformer-Based Global Zenith Tropospheric Delay Forecasting Model

TransMF: Transformer-Based Multi-Scale Fusion Model for Crack Detection

Relation constraint self-attention for image captioning

HRNet Encoder and Dual-Branch Decoder Framework-Based Scene Text Recognition Model

Pixel-Reasoning-Based Robotics Fine Grasping for Novel Objects with Deep EDINet Structure

Robot Path Planning via Neural-Network-Driven Prediction

Solar Filament Detection Based on Improved DeepLab V3+

Temporal information-guided dynamic dual-tracer PET signal separation network.

Multiple Traffic Target Tracking with Spatial-Temporal Affinity Network.

Hard Negative Samples Contrastive Learning for Remaining Useful-Life Prediction of Bearings

Trailer hopper automatic detection method for silage harvesting based improved U-Net

MEA-Net: multilayer edge attention network for medical image segmentation

A Context Feature Enhancement Network for Building Extraction from High-Resolution Remote Sensing Imagery

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Decoder Module Research Articles

Articles published on Decoder Module

MID: A Novel Mountainous Remote Sensing Imagery Registration Dataset Assessed by a Coarse-to-Fine Unsupervised Cascading Network

TransText: Improving scene text detection via transformer

AlgaeMask: An Instance Segmentation Network for Floating Algae Detection

PCUNet: A Context-Aware Deep Network for Coarse-to-Fine Point Cloud Completion

Parameter Identification in Power Transmission Systems Based on Graph Convolution Network

Semantic segmentation of bone structures in chest X-rays including unhealthy radiographs: A robust and accurate approach

Progressive multi-branch embedding fusion network for underwater image enhancement

Transformer-Based Global Zenith Tropospheric Delay Forecasting Model

TransMF: Transformer-Based Multi-Scale Fusion Model for Crack Detection

Relation constraint self-attention for image captioning

HRNet Encoder and Dual-Branch Decoder Framework-Based Scene Text Recognition Model

Pixel-Reasoning-Based Robotics Fine Grasping for Novel Objects with Deep EDINet Structure

Robot Path Planning via Neural-Network-Driven Prediction

Solar Filament Detection Based on Improved DeepLab V3+

Temporal information-guided dynamic dual-tracer PET signal separation network.

Multiple Traffic Target Tracking with Spatial-Temporal Affinity Network.

Hard Negative Samples Contrastive Learning for Remaining Useful-Life Prediction of Bearings

Trailer hopper automatic detection method for silage harvesting based improved U-Net

MEA-Net: multilayer edge attention network for medical image segmentation

A Context Feature Enhancement Network for Building Extraction from High-Resolution Remote Sensing Imagery