A fully locally selective large kernel network for traffic video detection

Yue Hou,Zhihao Zhang,Lixia Du,Jie Yin

doi:10.1016/j.measurement.2024.115779

Abstract

Vision Transformer (ViT) are widely used in the field of traffic video inspection because of their wide sensing field and strong global context-capturing capability. However, the ViT model has inadequate local detail feature extraction and high computational complexity. Convoluted neural networks(CNN) are commonly used as a means of video image detection. Although it is more flexible and less computationally complex in the extraction of local details, the fixed-size kernel limits it. It is difficult to effectively utilize the full local context information, resulting in the difficulty of effectively extracting the tiny targets with uneven scale distribution in the region. It is prone to leakage and misdetection of small targets. In view of this, this study proposes a new fully localized selective large kernel network (FL-SLKNet). The network effectively captures specific targets and background information through a fully localized feature extraction method, thus making small targets in the region more distinctive features. On this basis, the Adaptive Expansion Residual Module (AERM) is proposed, which solves the problem of image resolution degradation due to sensory field expansion by combining point-by-point convolution and expansion convolution and utilizing residual connections to supplement the lost information. In order to effectively identify tiny targets in the local area, this study proposes the Multi-Scale Frequency Domain Encoder (MSFDEncoder); this encoder utilizes a high-frequency branch to capture local details and a low-frequency branch to focus on the global structure, effectively capturing granular features of targets at various scales. Experimental results on the VisDrone2019-DET and BDD-100K datasets indicate that FL-SLKNet outperforms other advanced models in traffic video detection, with an increase of 3.3% and 3.4% in mAP0.5 compared to the RT-DETR model. Meanwhile, the detection performance of FL-SLKNet is excellent compared with the YOLO series on the SZ Actual Traffic Video dataset. In addition, in the rainy and hazy simulation scenarios, FL-SLKNet’s mAP0.5 improves by 5.5% and 2.6% compared to the RT-DETR model, and the extreme weather robustness performance is better.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A fully locally selective large kernel network for traffic video detection

Abstract

Talk to us

Similar Papers

More From: Measurement

Lead the way for us

Similar Papers

CVTrack: Combined Convolutional Neural Network and Vision Transformer Fusion Model for Visual Tracking.
Jian Wang ... Ce Song
Sensors (Basel, Switzerland) | VOL. 24
Jian Wang, et. al.Jian Wang ... Ce Song
03 Jan 2024
Sensors (Basel, Switzerland) | VOL. 24

A Dual-Branch Fusion Network Based on Reconstructed Transformer for Building Extraction in Remote Sensing Imagery.
Yitong Wang ... Aixia Dou
Sensors (Basel, Switzerland) | VOL. 24
Yitong Wang, et. al.Yitong Wang ... Aixia Dou
07 Jan 2024
Sensors (Basel, Switzerland) | VOL. 24

Global and local feature extraction based on convolutional neural network residual learning for MR image denoising.
Meng Li ... Gongfa Li
Physics in medicine and biology | VOL. -
Meng Li, et. al.Meng Li ... Gongfa Li
23 Sep 2024
Physics in medicine and biology | VOL. -

Medical Image Classification with a Hybrid SSM Model Based on CNN and Transformer
Can Hu ... Han Zhou
Electronics | VOL. 13
Can Hu, et. al.Can Hu ... Han Zhou
05 Aug 2024
Electronics | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A fully locally selective large kernel network for traffic video detection

Abstract

Talk to us

Similar Papers

More From: Measurement