Multimodal Fusion Network Research Articles

In the wave of research on autonomous driving, 3D object detection from the Bird’s Eye View (BEV) perspective has emerged as a pivotal area of focus. The essence of this challenge is the effective fusion of camera and LiDAR data into the BEV. Current approaches predominantly train and predict within the front view and Cartesian coordinate system, often overlooking the inherent structural and operational differences between cameras and LiDAR sensors. This paper introduces CL-FusionBEV, an innovative 3D object detection methodology tailored for sensor data fusion in the BEV perspective. Our approach initiates with a view transformation, facilitated by an implicit learning module that transitions the camera’s perspective to the BEV space, thereby aligning the prediction module. Subsequently, to achieve modal fusion within the BEV framework, we employ voxelization to convert the LiDAR point cloud into BEV space, thereby generating LiDAR BEV spatial features. Moreover, to integrate the BEV spatial features from both camera and LiDAR, we have developed a multi-modal cross-attention mechanism and an implicit multi-modal fusion network, designed to enhance the synergy and application of dual-modal data. To counteract potential deficiencies in global reasoning and feature interaction arising from multi-modal cross-attention, we propose a BEV self-attention mechanism that facilitates comprehensive global feature operations. Our methodology has undergone rigorous evaluation on a substantial dataset within the autonomous driving domain, the nuScenes dataset. The outcomes demonstrate that our method achieves a mean Average Precision (mAP) of 73.3% and a nuScenes Detection Score (NDS) of 75.5%, particularly excelling in the detection of cars and pedestrians with high accuracies of 89% and 90.7%, respectively. Additionally, CL-FusionBEV exhibits superior performance in identifying occluded and distant objects, surpassing existing comparative methods.

Target detection in autonomous driving is a pivotal domain of the current research focus. The process fundamentally relies on on-board sensors tasked to identify proximate objects. However, long-range perception instability and the partial obstruction by traffic participants further complicate the challenge. These factors collectively affected the effectiveness of targeted detection from the cooperative vehicle-infrastructure system (CVIS) and urgently need to be addressed. Starting from infrastructure-side assisted detection, we propose a 3D detection network based on multi-sensor sensing. Our approach consists of three sub-networks: Diversity Balanced Feature Fusion Network (MRBNeXt), Early Multimodal Fusion Network (VBRFusion), and TwoStage Lightweight Detection Network (TSL). MRBNeXt focuses on extracting raw images fused into multilevel semantic representations to address the drawbacks of needing a rich feature-level representation; VBRFusion proposes a two-branch structure that acts on the point cloud voxelization to aggregate high-dimensional features. The point features are mapped to the sampled graphical semantic features via the coordinate features to complete the early fusion and thus improve the feature quality of a single modality. In the proposed area network, TSL enhances the sensory field processing of multidimensional features using the context-aware module in a two-stage approach to achieve fast target recognition at different scale levels. Finally, we perform comparative ablation experiments on the DAIR-V2X vehicle-infrastructure dataset. The results validated our approach and demonstrated its effectiveness and enhancement in detection accuracy at the infrastructure end compared to current state-of-the-art methods. This improvement significantly boosted the performance of 3D target detection tasks in complex traffic scenarios and provided a more robust justification for the subsequent development of vehicle-side-infrastructure-side collaborative 3D target detection.

Multimodal Fusion Network Research Articles

Related Topics

Articles published on Multimodal Fusion Network

The Analysis of Smarter Future

Degradation-Guided Multi-Modal Fusion Network for Depth Map Super-Resolution

TAG-fusion: Two-stage attention guided multi-modal fusion network for semantic segmentation

Fish behavior recognition based on an audio-visual multimodal interactive fusion network

Multi-Modal Fusion Network with Multi-Head Self-Attention for Injection Training Evaluation in Medical Education

MAFNet: Multimodal Asymmetric Fusion Network for Radar Echo Extrapolation

ACDF-YOLO: Attentive and Cross-Differential Fusion Network for Multimodal Remote Sensing Object Detection

Specific Emitter Identification Algorithm Based on Time–Frequency Sequence Multimodal Feature Fusion Network

ETBHD‐HMF: A Hierarchical Multimodal Fusion Architecture for Enhanced Text‐Based Hair Design

MFFNet: Multimodal feature fusion network for RGB-D transparent object detection

3D Multimodal Fusion Network with Disease-induced Joint Learning for Early Alzheimer's Disease Diagnosis.

Multimodal fusion network for ICU patient outcome prediction

A multimodal dual-branch fusion network for fetal hypoxia detection

CL-fusionBEV: 3D object detection method with camera-LiDAR fusion in Bird’s Eye View

FuseNet: a multi-modal feature fusion network for 3D shape classification

Multi-task disagreement-reducing multimodal sentiment fusion network

Infrastructure-assisted 3D detection networks based on camera-lidar early fusion strategy

Sentiment analysis of social media comments based on multimodal attention fusion network

A transformer-encoder-based multimodal multi-attention fusion network for sentiment analysis

E-Learning system application in art entrepreneurship teaching based on multimodal feature fusion and neural network

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Multimodal Fusion Network Research Articles

Related Topics

Articles published on Multimodal Fusion Network

The Analysis of Smarter Future

Degradation-Guided Multi-Modal Fusion Network for Depth Map Super-Resolution

TAG-fusion: Two-stage attention guided multi-modal fusion network for semantic segmentation

Fish behavior recognition based on an audio-visual multimodal interactive fusion network

Multi-Modal Fusion Network with Multi-Head Self-Attention for Injection Training Evaluation in Medical Education

MAFNet: Multimodal Asymmetric Fusion Network for Radar Echo Extrapolation

ACDF-YOLO: Attentive and Cross-Differential Fusion Network for Multimodal Remote Sensing Object Detection

Specific Emitter Identification Algorithm Based on Time–Frequency Sequence Multimodal Feature Fusion Network

ETBHD‐HMF: A Hierarchical Multimodal Fusion Architecture for Enhanced Text‐Based Hair Design

MFFNet: Multimodal feature fusion network for RGB-D transparent object detection

3D Multimodal Fusion Network with Disease-induced Joint Learning for Early Alzheimer's Disease Diagnosis.

Multimodal fusion network for ICU patient outcome prediction

A multimodal dual-branch fusion network for fetal hypoxia detection

CL-fusionBEV: 3D object detection method with camera-LiDAR fusion in Bird’s Eye View

FuseNet: a multi-modal feature fusion network for 3D shape classification

Multi-task disagreement-reducing multimodal sentiment fusion network

Infrastructure-assisted 3D detection networks based on camera-lidar early fusion strategy

Sentiment analysis of social media comments based on multimodal attention fusion network

A transformer-encoder-based multimodal multi-attention fusion network for sentiment analysis

E-Learning system application in art entrepreneurship teaching based on multimodal feature fusion and neural network