Comprehensive Experiments Research Articles

Pattern recognition through the fusion of RGB frames and Event streams has emerged as a novel research area in recent years. Current methods typically employ backbone networks to individually extract the features of RGB frames and event streams, and subsequently fuse these features for pattern recognition. However, we posit that these methods may suffer from two key issues: (1). They attempt to directly learn a mapping from the input vision modality to the semantic labels. This approach often leads to sub-optimal results due to the disparity between the input and semantic labels; (2). They utilize small-scale backbone networks for the extraction of RGB and Event input features, thus these models fail to harness the recent performance advancements of large-scale visual-language models. In this study, we introduce a novel pattern recognition framework that consolidates the semantic labels, RGB frames, and event streams, leveraging pre-trained large-scale vision–language models. Specifically, given the input RGB frames, event streams, and all the predefined semantic labels, we employ a pre-trained large-scale vision model (CLIP vision encoder) to extract the RGB and event features. To handle the semantic labels, we initially convert them into language descriptions through prompt engineering and polish using ChatGPT, and then obtain the semantic features using the pre-trained large-scale language model (CLIP text encoder). Subsequently, we integrate the RGB/Event features and semantic features using multimodal Transformer networks. The resulting frame and event tokens are further amplified using self-attention layers. Concurrently, we propose to enhance the interactions between text tokens and RGB/Event tokens via cross-attention. Finally, we consolidate all three modalities using self-attention and feed-forward layers for recognition. Comprehensive experiments on the HARDVS and PokerEvent datasets fully substantiate the efficacy of our proposed SAFE model. The source code has been released at https://github.com/Event-AHU/SAFE_LargeVLM.

Read full abstract

In scenarios where global navigation satellite systems (GNSSs) and radio navigation systems are denied, vision-based autonomous landing (VAL) for fixed-wing unmanned aerial vehicles (UAVs) becomes essential. Accurate and real-time runway detection in VAL is vital for providing precise positional and orientational guidance. However, existing research faces significant challenges, including insufficient accuracy, inadequate real-time performance, poor robustness, and high susceptibility to disturbances. To address these challenges, this paper introduces a novel single-stage, anchor-free, and decoupled vision-based runway detection framework, referred to as YOLO-RWY. First, an enhanced data augmentation (EDA) module is incorporated to perform various augmentations, enriching image diversity, and introducing perturbations that improve generalization and safety. Second, a large separable kernel attention (LSKA) module is integrated into the backbone structure to provide a lightweight attention mechanism with a broad receptive field, enhancing feature representation. Third, the neck structure is reorganized as a bidirectional feature pyramid network (BiFPN) module with skip connections and attention allocation, enabling efficient multi-scale and across-stage feature fusion. Finally, the regression loss and task-aligned learning (TAL) assigner are optimized using efficient intersection over union (EIoU) to improve localization evaluation, resulting in faster and more accurate convergence. Comprehensive experiments demonstrate that YOLO-RWY achieves AP50:95 scores of 0.760, 0.611, and 0.413 on synthetic, real nominal, and real edge test sets of the landing approach runway detection (LARD) dataset, respectively. Deployment experiments on an edge device show that YOLO-RWY achieves an inference speed of 154.4 FPS under FP32 quantization with an image size of 640. The results indicate that the proposed YOLO-RWY model possesses strong generalization and real-time capabilities, enabling accurate runway detection in complex and challenging visual environments, and providing support for the onboard VAL systems of fixed-wing UAVs.

Read full abstract

Comprehensive Experiments Research Articles

Related Topics

Articles published on Comprehensive Experiments

EHNet: Efficient Hybrid Network with Dual Attention for Image Deblurring

Semantic-aware frame-event fusion based pattern recognition via large vision–language models

GGAS2SN: Gated Graph and SmilesToSeq Network for Solubility Prediction.

A multi-source domain feature adaptation network for potato disease recognition in field environment.

YOLO-RWY: A Novel Runway Detection Model for Vision-Based Autonomous Landing of Fixed-Wing Unmanned Aerial Vehicles

GSSCL: A framework for Graph Self-Supervised Curriculum Learning based on clustering label smoothing

Advancing epigenetic profiling in cervical cancer: machine learning techniques for classifying DNA methylation patterns.

Optimizing cancer classification: a hybrid RDO-XGBoost approach for feature selection and predictive insights

Coupling effects of irrigation amount and fertilization rate on growth and bioactive components of four-year-old licorice (Glycyrrhiza uralensis Fisch) in arid regions of Xinjiang

Meta Learning to Rank for Sparsely Supervised Queries

Enhancing Graph Neural Networks via Memorized Global Information

Transfer Learning Enabled Modeling Paradigm for PVT-aware Circuit Performance Estimation

Changes in the Phylogenetic Structure of Alpine Grassland Plant Communities on the Qinghai-Tibetan Plateau with Long-Term Nitrogen Deposition.

DAPLSR: Data Augmentation Partial Least Squares Regression Model via Manifold Optimization

Multi‐objective based container placement strategy in CaaS

Enhanced electrical and mechanical properties of additively manufactured pure copper with green laser

Semi-supervised action recognition with dynamic temporal information fusion

BioSAM: Generating SAM Prompts From Superpixel Graph for Biological Instance Segmentation.

Soft-label recover based label-specific features learning

Improving Crystal Property Prediction from a Multiplex Graph Perspective.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Comprehensive Experiments Research Articles

Related Topics

Articles published on Comprehensive Experiments

EHNet: Efficient Hybrid Network with Dual Attention for Image Deblurring

Semantic-aware frame-event fusion based pattern recognition via large vision–language models

GGAS2SN: Gated Graph and SmilesToSeq Network for Solubility Prediction.

A multi-source domain feature adaptation network for potato disease recognition in field environment.

YOLO-RWY: A Novel Runway Detection Model for Vision-Based Autonomous Landing of Fixed-Wing Unmanned Aerial Vehicles

GSSCL: A framework for Graph Self-Supervised Curriculum Learning based on clustering label smoothing

Advancing epigenetic profiling in cervical cancer: machine learning techniques for classifying DNA methylation patterns.

Optimizing cancer classification: a hybrid RDO-XGBoost approach for feature selection and predictive insights

Coupling effects of irrigation amount and fertilization rate on growth and bioactive components of four-year-old licorice (Glycyrrhiza uralensis Fisch) in arid regions of Xinjiang

Meta Learning to Rank for Sparsely Supervised Queries

Enhancing Graph Neural Networks via Memorized Global Information

Transfer Learning Enabled Modeling Paradigm for PVT-aware Circuit Performance Estimation

Changes in the Phylogenetic Structure of Alpine Grassland Plant Communities on the Qinghai-Tibetan Plateau with Long-Term Nitrogen Deposition.

DAPLSR: Data Augmentation Partial Least Squares Regression Model via Manifold Optimization

Multi‐objective based container placement strategy in CaaS

Enhanced electrical and mechanical properties of additively manufactured pure copper with green laser

Semi-supervised action recognition with dynamic temporal information fusion

BioSAM: Generating SAM Prompts From Superpixel Graph for Biological Instance Segmentation.

Soft-label recover based label-specific features learning

Improving Crystal Property Prediction from a Multiplex Graph Perspective.