Long-range Model Research Articles

Remote sensing images are frequently contaminated by clouds that often degrade the performance of subsequent applications. Cloud removal, therefore, is a standard step in remote sensing image preprocessing, and single-image-based thin cloud removal is a well-established area of research. Existing single-image-based thin cloud removal methods however, lack the capacity for simultaneous executions of efficient long-range modeling and physical attribute consideration. To extend this work and fill this gap, a novel blind single-image-based thin cloud removal method, called cloud perception integrated fast Fourier convolutional network (CP-FFCN), was designed and implemented. The CP-FFCN consists of two modules: the cloud perception module (CPM) and a fast Fourier convolution (FFC)-conducted reconstruction module (FFCN). The CPM uses a frequency spatial attention mechanism to realize long-range modeling of clouds, globally detect them in the cloudy image. It helps the CP-FFCN remove the clouds without external prior knowledge of the cloud distribution. The reconstruction module was designed with an FFC-conducted U-Net architecture to recover the clean images from cloudy scenarios, guided by the locations of clouds as detected by the CPM. In addition, the FFC blocks deployed in the encoder and decoder components in the U-Net architecture selectively learn the attributes of clouds and fogs from the frequency spectrograms to remove the clouds and reconstruct the underlying ground objects. The CP-FFCN selectively learns the frequency features for adequate cloud separation and at the same time efficiently models the long-range information for comprehensive scenario reconstruction with the help of these two modules. We adopted the Google Earth data and Landsat-8 imagery to train the CP-FFCN model and evaluate it on simulated and naturally occurring cloudy scenarios. The visual outcomes illustrate that the proposed CP-FFCN successfully removes thin and small-scale thick clouds with complex ground object scenarios, without external cloud masks and additional reference data. The quantitative analyses further demonstrate the higher effectiveness of the CP-FFCN when compared with several other state-of-the-art thin cloud removal methods, yielding a PSNR value over 39.24 and a SSIM value over 0.98 on the Landsat 8 images.

Mainstream multi-object tracking methods exploit appearance information and/or motion information to achieve interframe association. However, dealing with similar appearance and occlusion is a challenge for appearance information, while motion information is limited by linear assumptions and is prone to failure in nonlinear motion patterns. In this work, we disregard appearance clues and propose a pure motion tracker to address the above issues. It dexterously utilizes Transformer to estimate complex motion and achieves high-performance tracking with low computing resources. Furthermore, contrastive learning is introduced to optimize feature representation for robust association. Specifically, we first exploit the long-range modeling capability of Transformer to mine intention information in temporal motion and decision information in spatial interaction and introduce prior detection to constrain the range of motion estimation. Then, we introduce contrastive learning as an auxiliary task to extract reliable motion features to compute affinity and introduce bidirectional matching to improve the affinity computation distribution. In addition, given that both tasks are dedicated to narrowing the embedding distance between the motion features of the tracked object and the detection features, we design a joint-motion-and-association framework to unify the above two tasks in one framework for optimization. The experimental results achieved with three benchmark datasets, MOT17, MOT20 and DanceTrack, verify the effectiveness of our proposed method. Compared with state-of-the-art methods, the proposed STDFormer sets a new state-of-the-art on DanceTrack and achieves competitive performance on MOT17 and MOT20. This demonstrates the advantage of our method in handling associations under similar appearance, occlusion or nonlinear motion. At the same time, the significant advantages of the proposed method over Transformer-based and contrastive learning-based methods suggest a new direction for the application of Transformer and contrastive learning in MOT. In addition, to verify the generalization of STDFormer in unmanned aerial vehicle (UAV) videos, we also evaluate STDFormer on VisDrone2019. The results show that STDFormer achieves state-of-the-art performance on VisDrone2019, which proves that it can handle small-scale object associations in UAV videos well. The code is available at https://github.com/Xiaotong-Zhu/STDFormer.

Long-range Model Research Articles

Related Topics

Articles published on Long-range Model

HMDA: A Hybrid Model with Multi-scale Deformable Attention for Medical Image Segmentation.

D-SAT: dual semantic aggregation transformer with dual attention for medical image segmentation

A novel attention-enhanced network for image super-resolution

Entanglement entropy and topological properties in a long-range non-Hermitian Su–Schrieffer–Heeger model

SCA-Former: transformer-like network based on stream-cross attention for medical image segmentation

Global semantic-guided network for saliency prediction

Corner-to-Center long-range context model for efficient learned image compression

A lightweight vision transformer with symmetric modules for vision tasks

Reciprocal transformer for hyperspectral and multispectral image fusion

IoUformer: Pseudo-IoU prediction with transformer for visual tracking

GP-Net: Image Manipulation Detection and Localization via Long-Range Modeling and Transformers

Blind single-image-based thin cloud removal using a cloud perception integrated fast Fourier convolutional network

STDFormer: Spatial-Temporal Motion Transformer for Multiple Object Tracking

Phase classification in the long-range Harper model using machine learning

Phase ordering dynamics of the random-field long-range Ising model in one dimension.

Finite-temperature critical behaviors in 2D long-range quantum Heisenberg model

TransDose: Transformer-based radiotherapy dose prediction from CT images guided by super-pixel-level GCN classification.

X-ray Detection of Prohibited Item Method Based on Dual Attention Mechanism

Window Token Transformer: Can learnable window token help window-based transformer build better long-range interactions?

Component-aware anomaly detection framework for adjustable and logical industrial visual inspection

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Long-range Model Research Articles

Related Topics

Articles published on Long-range Model

HMDA: A Hybrid Model with Multi-scale Deformable Attention for Medical Image Segmentation.

D-SAT: dual semantic aggregation transformer with dual attention for medical image segmentation

A novel attention-enhanced network for image super-resolution

Entanglement entropy and topological properties in a long-range non-Hermitian Su–Schrieffer–Heeger model

SCA-Former: transformer-like network based on stream-cross attention for medical image segmentation

Global semantic-guided network for saliency prediction

Corner-to-Center long-range context model for efficient learned image compression

A lightweight vision transformer with symmetric modules for vision tasks

Reciprocal transformer for hyperspectral and multispectral image fusion

IoUformer: Pseudo-IoU prediction with transformer for visual tracking

GP-Net: Image Manipulation Detection and Localization via Long-Range Modeling and Transformers

Blind single-image-based thin cloud removal using a cloud perception integrated fast Fourier convolutional network

STDFormer: Spatial-Temporal Motion Transformer for Multiple Object Tracking

Phase classification in the long-range Harper model using machine learning

Phase ordering dynamics of the random-field long-range Ising model in one dimension.

Finite-temperature critical behaviors in 2D long-range quantum Heisenberg model

TransDose: Transformer-based radiotherapy dose prediction from CT images guided by super-pixel-level GCN classification.

X-ray Detection of Prohibited Item Method Based on Dual Attention Mechanism

Window Token Transformer: Can learnable window token help window-based transformer build better long-range interactions?

Component-aware anomaly detection framework for adjustable and logical industrial visual inspection