Published in last 50 years
Articles published on Pedestrian Detection
- New
- Research Article
- 10.54254/2755-2721/2026.tj29193
- Nov 5, 2025
- Applied and Computational Engineering
- Mingyi Li
Multimodal perception enhances robustness in industrial inspection and mobile robotics by fusing complementary signals when individual modalities falter due to insufficient lighting, specular reflections, motion-induced blurring, or limited textural information. This article synthesizes evidence from peer-reviewed studies and normalizes metrics across representative datasets to characterize what RGB, depth, thermal, LiDAR, radar, and IMU achieve alone and what they achieve in combination. Using MVTec AD, KAIST, TUM VI, and nuScenes as anchors, the synthesis compares miss rate, trajectory error, 3D detection quality, and birds-eye-view map fidelity while considering latency, power integrity, electromagnetic compatibility, bandwidth, and maintainability. The concordant findings reveal that color-thermal integration significantly diminishes failures in pedestrian detection under low-illumination conditions, while tightly integrated visual-inertial systems curtail drift compared to purely visual odometry. Furthermore, birds-eye-view integration enhances 3D detection and mapping performance relative to camera-only or LiDAR-only benchmarks. The analysis also identifies system prerequisites that enable reproducible gainsprecise timing, disciplined calibration, robust power and electromagnetic practice, and sufficient bandwidthand concludes with implementation guidelines to help transfer benchmark-reported benefits to factory floors and field robots.
- New
- Research Article
- 10.3390/ijgi14110438
- Nov 5, 2025
- ISPRS International Journal of Geo-Information
- Juyeon Cho + 1 more
Detecting anomalous pedestrian behaviors is critical for enhancing safety in dense urban environments, particularly in complex back streets where movement patterns are irregular and context-dependent. While extensive research has been conducted on trajectory-based anomaly detection for vehicles, ships, and aircraft, few studies have focused on pedestrians, whose behaviors are strongly influenced by surrounding spatial and environmental conditions. This study proposes a pedestrian anomaly detection framework based on a Variational Autoencoder (VAE), designed to identify and interpret abnormal trajectories captured by large-scale Closed-Circuit Television (CCTV) systems in urban back streets. The framework extracts 14 movement features across point, trajectory, and grid levels, and employs the VAE to learn normal movement patterns and detect deviations from them. A total of 1.88 million trajectories were analyzed, and approximately 1.05% were identified as anomalous. These were further categorized into three behavioral types—wandering, slow-linear, and stationary—through clustering analysis. Contextual interpretation revealed that anomaly types differ substantially by time of day, spatial configuration, and weather conditions. The final optimized model achieved an accuracy of 97.80% and an F1-score of 94.63%, demonstrating its strong capability to detect abnormal pedestrian movement while minimizing false alarms. By integrating deep learning with contextual urban analytics, this study contributes to data-driven frameworks for real-time pedestrian safety monitoring and spatial risk assessment in complex urban environments.
- New
- Research Article
- 10.1371/journal.pone.0334786
- Nov 4, 2025
- PLOS One
- Guofeng Qin + 4 more
To address the challenges of low accuracy, high miss detection rate, and poor tracking stability in pedestrian detection and tracking under dense occlusion and small object scenarios on traffic roads, this paper proposes a pedestrian detection and tracking algorithm based on improved YOLOv5s and DeepSORT. For the improvements in the YOLOv5s detection network, first, the Focal-EIoU loss function is used to replace the CIoU loss function. Second, a 160 × 160-pixel Small Object (SO) detection layer is added to the Neck structure. Finally, the Multi-Head Self-Attention (MHSA) mechanism is introduced into the Backbone network to enhance the model’s detection performance. Regarding the improvements in the DeepSORT tracking framework, a lightweight ShuffleNetV2 network is integrated into the appearance feature extraction network, reducing the number of model parameters while maintaining accuracy. Experimental results show that the improved YOLOv5s achieves an mAP0.5 of 80.8% and an mAP0.5:0.95 of 49.7%, representing increases of 4.4% and 3.9%, respectively, compared to the original YOLOv5s. The enhanced YOLOv5s-DeepSORT achieves an MOTA of 50.7% and an MOTP of 77.3%, improving by 3.3% and 0.5%, respectively, over the original YOLOv5s-DeepSORT. Additionally, the number of identity switches (IDs) is reduced by 11.3%, and the model size is reduced to 20% of the original algorithm, enhancing its portability. The proposed method demonstrates strong robustness and can effectively track targets of different sizes.
- New
- Research Article
- 10.1016/j.aei.2025.103653
- Nov 1, 2025
- Advanced Engineering Informatics
- He Li + 2 more
MSASW: Multi-view scale-aware pedestrian detection with shared-weight supervision
- New
- Research Article
- 10.1016/j.ins.2025.122263
- Nov 1, 2025
- Information Sciences
- Zhe Jia + 6 more
Adversarial Infrared Catmull-Rom Spline: A black-box attack on infrared pedestrian detectors in the physical world
- New
- Research Article
- 10.1109/tnnls.2025.3624356
- Oct 31, 2025
- IEEE transactions on neural networks and learning systems
- Wenliang Ge + 3 more
The pedestrian detection is crucial in practical applications, such as autonomous driving and video surveillance. However, the existing research mainly focuses on improving detection accuracy, with relatively little attention paid to model complexity and operational efficiency. In scenarios with high real-time requirements, the practical deployment of pedestrian detectors still faces many difficulties. To this end, we propose a lightweight and efficient pedestrian detection network (LEPD-Net). First, we design a PoolFormer-based detection head (PDH) to reduce the model computation and inference time. Second, to compensate for the deficiency of PDH in global context modeling, we design a triple-branch joint attention module (TJAM). TJAM uses only a small number of parameters and strengthens the model's contextual representation by capturing spatial location dependencies and global semantic information between channels. Finally, after incorporating PDH and TJAM into the backbone network, a lightweight and efficient pedestrian detector is constructed. We benchmarked the model on mainstream pedestrian datasets Caltech and CityPersons. The results show that our model achieves the current state-of-the-art performance level. In addition, our model reduces inference time by 25% while maintaining accuracy.
- New
- Research Article
- 10.1002/cpe.70395
- Oct 29, 2025
- Concurrency and Computation: Practice and Experience
- Xiang Gu + 4 more
ABSTRACT This study addresses the challenge of pedestrian detection in low‐light conditions, in which traditional detection models often suffer performance degradation due to insufficient illumination and low contrast. We propose a novel detection model, YOLO‐LFormer, which integrates low‐light image enhancement with a lightweight vision transformer. Lightweight YUV transformer‐based network for low‐light image enhancement (LYT‐Net) is employed to enhance image brightness and details, while a MobileViTv3 backbone network combines CNN and transformer structures to extract local and global features. The temporal–spatial attention (TSA) attention mechanism and reparameterized convolution with channel shuffle (RCS) reparameterized convolution are introduced to enhance feature representation, and the Wise‐IOUv3 loss function optimizes bounding box regression. Experiments on the BDD100K low‐light dataset demonstrate that YOLO‐LFormer achieves 78.42% and 44.35% on mAP@0.5 and mAP@0.5:0.95, respectively, outperforming various mainstream detection models. This approach offers high accuracy, real‐time performance, and suitability for resource‐constrained practical scenarios.
- New
- Research Article
- 10.3390/computers14110459
- Oct 24, 2025
- Computers
- Yongheng Zhang
Image restoration tasks such as deraining, deblurring, and dehazing are crucial for enhancing the environmental perception of autonomous vehicles and traffic systems, particularly for tasks like vehicle detection, pedestrian detection and lane line identification. While transformer-based models excel in these tasks, their prohibitive computational complexity hinders real-world deployment on resource-constrained platforms. To bridge this gap, this paper introduces a novel Soft Knowledge Distillation (SKD) framework, designed specifically for creating highly efficient yet powerful image restoration models. Our core innovation is twofold: first, we propose a Multi-dimensional Cross-Net Attention(MCA) mechanism that allows a compact student model to learn comprehensive attention relationships from a large teacher model across both spatial and channel dimensions, capturing fine-grained details essential for high-quality restoration. Second, we pioneer the use of a contrastive learning loss at the reconstruction level, treating the teacher’s outputs as positives and the degraded inputs as negatives, which significantly elevates the student’s reconstruction quality. Extensive experiments demonstrate that our method achieves a superior trade-off between performance and efficiency, notably enhancing downstream tasks like object detection. The primary contributions of this work lie in delivering a practical and compelling solution for real-time perceptual enhancement in autonomous systems, pushing the boundaries of efficient model design.
- Research Article
- 10.1016/j.measurement.2025.118009
- Oct 1, 2025
- Measurement
- Wajdi Farhat + 3 more
Pedestrian detection and tracking using an enhanced YOLOv9 model for automotive vehicles
- Research Article
- 10.3390/signals6040053
- Oct 1, 2025
- Signals
- Md Reasad Zaman Chowdhury + 5 more
Autonomous driving has emerged as a rapidly advancing field in both industry and academia over the past decade. Among the enabling technologies, computer vision (CV) has demonstrated high accuracy across various domains, making it a critical component of autonomous vehicle systems. However, CV tasks are computationally intensive and often require hardware accelerators to achieve real-time performance. Field Programmable Gate Arrays (FPGAs) have gained popularity in this context due to their reconfigurability and high energy efficiency. Numerous researchers have explored FPGA-accelerated CV solutions for autonomous driving, addressing key tasks such as lane detection, pedestrian recognition, traffic sign and signal classification, vehicle detection, object detection, environmental variability sensing, and fault analysis. Despite this growing body of work, the field remains fragmented, with significant variability in implementation approaches, evaluation metrics, and hardware platforms. Crucial performance factors, including latency, throughput, power consumption, energy efficiency, detection accuracy, datasets, and FPGA architectures, are often assessed inconsistently. To address this gap, this paper presents a comprehensive literature review of FPGA-accelerated, vision-based autonomous driving systems. It systematically examines existing solutions across sub-domains, categorizes key performance factors and synthesizes the current state of research. This study aims to provide a consolidated reference for researchers, supporting the development of more efficient and reliable next generation autonomous driving systems by highlighting trends, challenges, and opportunities in the field.
- Research Article
- 10.1016/j.array.2025.100563
- Oct 1, 2025
- Array
- Zhenhua Han + 2 more
Urban road pedestrian detection system integrating IoT technology and multi-sensor data fusion algorithm
- Research Article
- 10.1016/j.dsp.2025.105343
- Oct 1, 2025
- Digital Signal Processing
- Zenghui Qu + 7 more
LP-YOLO: An improved lightweight pedestrian detection algorithm based on YOLOv11
- Research Article
- 10.1016/j.jpdc.2025.105137
- Oct 1, 2025
- Journal of Parallel and Distributed Computing
- Riadh Ayachi + 3 more
Lightweight path aggregation network for pedestrian detection on FPGA board
- Research Article
- 10.3390/app151910607
- Sep 30, 2025
- Applied Sciences
- Lijuan Wang + 2 more
In security applications, visible-light pedestrian detectors are highly sensitive to changes in illumination and fail under low-light or nighttime conditions, while infrared sensors, though resilient to lighting, often produce blurred object boundaries that hinder precise localization. To address these complementary limitations, we propose a practical multimodal pipeline—Adaptive Energy–Gradient–Contrast (EGC) Fusion with AIFI-YOLOv12—that first fuses infrared and low-light visible images using per-pixel weights derived from local energy, gradient magnitude and contrast measures, then detects pedestrians with an improved YOLOv12 backbone. The detector integrates an AIFI attention module at high semantic levels, replaces selected modules with A2C2f blocks to enhance cross-channel feature aggregation, and preserves P3–P5 outputs to improve small-object localization. We evaluate the complete pipeline on the LLVIP dataset and report Precision, Recall, mAP@50, mAP@50–95, GFLOPs, FPS and detection time, comparing against YOLOv8, YOLOv10–YOLOv12 baselines (n and s scales). Quantitative and qualitative results show that the proposed fusion restores complementary thermal and visible details and that the AIFI-enhanced detector yields more robust nighttime pedestrian detection while maintaining a competitive computational profile suitable for real-world security deployments.
- Research Article
- 10.3390/s25185908
- Sep 21, 2025
- Sensors (Basel, Switzerland)
- Anxin Zhao + 2 more
Pedestrian intrusion in coal yard work areas is a major cause of accidents, posing challenges for the safe supervision of coal yards. Existing visual detection methods suffer under poor lighting and a lack of 3D data. To overcome these limitations, this study introduces a robust pedestrian intrusion detection method based on 3D LiDAR. Our approach consists of three main components. First, we propose a novel pedestrian detection network called EFT-RCNN. Based on Voxel-RCNN, this network introduces an EnhancedVFE module to improve spatial feature extraction, employs FocalConv to reconstruct the 3D backbone network for enhanced foreground–background distinction, and utilizes TeBEVPooling to optimize bird’s eye view (BEV) generation. Second, a precise 3D hazardous area is defined by combining a polygonal base surface, determined through on-site exploration, with height constraints. Finally, a point–region hierarchical judgment method is designed to calculate the spatial relationship between pedestrians and the hazardous area for graded warning. When evaluated on the public KITTI dataset, the EFT-RCNN network improved the average precision for pedestrian detection by 4.39% in 3D and 4.68% in BEV compared with the baseline, while maintaining a real-time processing speed of 28.56 FPS. In practical tests, the pedestrian detection accuracy reached 92.9%, with an average error in distance measurement of 0.054 m. The experimental results demonstrate that the proposed method effectively mitigates complex environmental interference, enables robust detection, and provides a reliable means for the proactive prevention of pedestrian intrusion accidents.
- Research Article
- 10.3390/s25185760
- Sep 16, 2025
- Sensors (Basel, Switzerland)
- Ohoud Alzamzami + 5 more
The advancement of Artificial Intelligence (AI) and the Internet of Things (IoT) has accelerated the development of Intelligent Transportation Systems (ITS) in smart cities, playing a crucial role in optimizing traffic flow, enhancing road safety, and improving the driving experience. With urban traffic becoming increasingly complex, timely detection and response to congestion and accidents are critical to ensuring safety and situational awareness. This paper presents Passable, an intelligent and adaptive traffic light control system that monitors traffic conditions in real time using deep learning and computer vision. By analyzing images captured from cameras at traffic lights, Passable detects road incidents and dynamically adjusts signal timings based on current vehicle density. It also employs wireless communication to alert drivers and update a centralized dashboard accessible to traffic management authorities. A working prototype integrating both hardware and software components was developed and evaluated. Results demonstrate the feasibility and effectiveness of designing an adaptive traffic signal control system that integrates incident detection, instantaneous communication, and immediate reporting to the relevant authorities. Such a design can enhance traffic efficiency and contribute to road safety. Future work will involve testing the system with real-world vehicular communication technologies on multiple coordinated intersections while integrating pedestrian and emergency vehicle detection.
- Research Article
- 10.1080/10095020.2025.2547947
- Sep 5, 2025
- Geo-spatial Information Science
- Lin Qi + 6 more
ABSTRACT The pedestrian tracking and motion detection system (P-TMDS) using distributed inertial sensors has broad application potential toward many emerging fields, such as motion tracking, emergency rescue, and others, due to its advanced autonomous navigation capabilities under signal-denied scenarios. The performance of current P-TMDS is constrained by the cumulative error of low-cost sensors, low accuracy of human motion detection, and lack of effective multi-sensor integration algorithms. This paper proposes a motion-constrained P-TMDS based on the adaptive integration of distributed inertial sensors and ultrasonic ranging (MP-TMDS). An enhanced position–attitude update algorithm is developed for the single-sensor module, which integrates the inertial navigation system (INS) mechanization with multi-level constraints and observations. In addition, a bi-directional long short-term memory (Bi-LSTM) structure is adopted to detect the outlier in ultrasonic ranging results and provide accurate distance observations for dual sensor module-based positioning systems. For the overall MP-TMDS, the measurements provided by distributed sensor modules and ultrasonic ranging are adopted as the input vector of designed spatial–temporal network training for human motion detection and walking speed estimation, and the detected human motion modes are further applied as the constraints for multi-module position–attitude update. Finally, an enhanced data and model dual-driven structure is proposed to adaptively integrate motion features acquired from distributed sensor modules and results of velocity and motion detection provided by spatial–temporal network. Real-world experiments in complex scenes represent that the developed MP-TMDS effectively increases the precision of traditional P-TMDS and outperforms existing algorithms under both positioning and motion detection accuracy indexes, and the estimated accuracy improvement is more than 18.4% compared with state-of-the-art algorithms.
- Research Article
- 10.1007/s13369-025-10568-1
- Sep 1, 2025
- Arabian Journal for Science and Engineering
- Hoang N Tran + 4 more
Multi-Task Real-Time 3D LiDAR Perception with Attention-Enhanced MobilePIXOR for Obstacle Segmentation and Pedestrian Detection in Autonomous Robots
- Research Article
- 10.35629/5252-0709432436
- Sep 1, 2025
- International Journal of Advances in Engineering and Management
- Jinpeng Song
With the advancement of intelligent sports monitoring systems, detecting individuals alongside running tracks has emerged as a critical task. Building upon YOLOv8, this paper introduces the Convolutional Block Attention Module (CBAM) attention mechanism and further optimises its channel attention module into Efficient Multi-scale Attention (EMA), thereby constructing three detection models: the original YOLOv8, YOLOv8+CBAM, and YOLOv8+CBAM(EMA). Experiments conducted on a self-built track-side pedestriandataset evaluate each model's performance in detection accuracy and robustness. Experiments were conducted on the self-built SideView Person Dataset to evaluate the performance of various models in terms of detection accuracy and robustness. The results demonstrate that after incorporating the EMA attention mechanism, the model achieved an mAP@0.5 of 0.879, surpassing the original YOLOv8's 0.868, thereby validating the effectiveness of EMAin optimising channel attention
- Research Article
- 10.1016/j.image.2025.117421
- Sep 1, 2025
- Signal Processing: Image Communication
- Xiaobiao Dai + 4 more
Multi-Exposure Image Enhancement and YOLO Integration for Nighttime Pedestrian Detection