Articles published on Object Detection
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
32773 Search results
Sort by Recency
- New
- Research Article
- 10.1007/s10278-026-01855-w
- Feb 6, 2026
- Journal of imaging informatics in medicine
- Yan Wang + 8 more
Early and accurate diagnosis of nasopharyngeal-laryngeal tumors is critical for improving patient prognosis. Deep learning methods have achieved significant progress in the automatic detection of lesions in static endoscopic images. However, during nasopharyngeal-laryngeal endoscopy, the quality of endoscopic videos often suffers from motion blur, uneven exposure, and reflective artifacts, which adversely affect the performance of existing static image detectors. Therefore, we propose a novel two-stage video lesion detection network, DynSTPN, to address the challenge of lesion detection in complex scenarios. First, in the prompt generation network stage, we design a dynamic prompt generator that generates discriminative prompt based on spatio-temporal feature representations of reference frames to mitigate quality degradation in inference frames. Second, at the object detection network stage, we introduce an adaptive differentiable gating mechanism to integrate reference frames' prompt information, dynamically adjusting the enhancement effect of reference frames on the inference frame. Experiments were conducted on two datasets: the self-constructed four-category nasopharyngeal-laryngeal lesion video object detection (NLLVOD) and the publicly available ImageNet VID dataset. Compared to state-of-the-art (SOTA) methods, DynSTPN achieved the best balance between detection accuracy and efficiency on the VID dataset. On the NLLVOD dataset, DynSTPN achieved a superior detection accuracy of 79.6% and speed of 29.4 FPS, meeting the real-time requirements for clinical applications. These results significantly outperform SOTA static image detector, YOLOv12-M. Experimental results demonstrate that DynSTPN effectively leverages information from video reference frames to enhance detection performance, achieving superior accuracy compared to SOTA image/video methods, thereby offering enhanced clinical applicability.
- New
- Research Article
- 10.1038/s41598-026-39084-4
- Feb 6, 2026
- Scientific reports
- Qingya Ouyang + 2 more
Soccer video analysis has significant application value in sports broadcasting, tactical research, and athlete training, with accurate object detection serving as the key foundation for automated analysis. Soccer object detection typically improves performance through enhanced feature representation and optimized network architectures, but these methods assume that models can automatically learn discriminative features of targets. Through experiments, we reveal the "feature collapse" phenomenon in soccer detection, where features of players from the same team are excessively clustered in high-dimensional space, and soccer ball features degenerate to near background noise. Furthermore, existing methods lack progressive feature evolution mechanisms, resulting in insufficient discriminative capability when handling dense scenes. To address these issues, we propose DeCon-Net, which contains a Decoupled Feature Learning Module (DFLM) and a Hierarchical Contrastive Constraint Module (HCCM). Specifically, DFLM designs dual-stream encoders to extract appearance features and identity features separately, forcing the identity stream to learn truly discriminative representations through mutual exclusivity constraints. HCCM adopts dynamic threshold contrastive learning, adaptively adjusting learning intensity based on feature distances between sample pairs, achieving progressive optimization from coarse to fine granularity. Experimental results demonstrate that DeCon-Net achieves significant performance improvements on the SportsMOT and SoccerNet-Tracking datasets, particularly showing substantial gains in ball detection.
- New
- Research Article
- 10.3390/photonics13020156
- Feb 6, 2026
- Photonics
- Chen Zuo + 1 more
Optical Wireless Power Transmission (OWPT) holds a significant position for enabling cable-free energy delivery in long-distance, high-energy, and mobile scenarios. However, ensuring human and equipment safety under high-power laser exposure remains a critical challenge. This study reports a vision-based OWPT safety system that implements the principle of automatic emission control (AEC)—dynamically modulating laser emission in real time to prevent hazardous exposure. While camera-based OWPT safety systems have been proposed in the concept, there are extremely limited working implementations to date. Moreover, existing systems struggle with response speed and single-object assumptions. To address these gaps, this research presents a low-latency safety architecture based on a customized deep learning-based object detection framework, a dedicated OWPT dataset, and a multi-threaded control stack. The research also introduces a real-time risk factor (RF) metric that evaluates proximity and velocity for each detected intrusion object (IO), enabling dynamic prioritization among multiple threats. The system achieves a minimum response latency of 14 ms (average 29 ms) and maintains reliable performance in complex multi-object scenarios. This work establishes a new benchmark for OWPT safety system and contributes a scalable reference for future development.
- New
- Research Article
- 10.1007/s11042-026-21174-0
- Feb 6, 2026
- Multimedia Tools and Applications
- Maosen Wang + 4 more
Research on improved YOLOv4 for open-pit mine object detection based on hybrid attention mechanism
- New
- Research Article
- 10.1088/1361-6501/ae397c
- Feb 6, 2026
- Measurement Science and Technology
- Jiapeng Li + 1 more
Abstract As a critical component of railway transportation systems, the turnout plays a key role in ensuring the safety of train operations and enhancing transport efficiency. The current signal of a turnout provides critical insight for identifying anomalies and exhibits distinct stage characteristics during its operation. To automatically identify these stages for equipment condition monitoring as well as intelligent operation and maintenance, accurate boundary detection is indispensable. Thus, this paper proposes a keypoint-guided stage segmentation approach for current curve images of railway turnout system. The method performs stage segmentation using solely the top-left keypoint of each region, with adjacent keypoints defining stage boundaries, dramatically reducing model complexity. To enhance boundary supervision, an anisotropic Gaussian kernel is introduced during feature map generation, and a Sobel-based auxiliary gradient constraint is incorporated to improve detection precision. Furthermore, a temporal continuity constraint is applied in the post-processing stage to ensure logical consistency between consecutive segments. Experimental results demonstrate the robust performance of the proposed method, achieving precise segmentation of turnout current-curve images under both normal and abnormal conditions. Furthermore, it maintains low computational cost and ensures the rationality of the detection results.
- New
- Research Article
- 10.1142/s2424922x26500026
- Feb 6, 2026
- Advances in Data Science and Adaptive Analysis
- Blessing Agyei Kyem + 4 more
Road infrastructure maintenance in developing countries faces unique challenges due to resource constraints and diverse environmental factors. This study addresses the critical need for efficient, accurate, and locally-relevant pavement distress detection methods in these regions. We present a novel deep learning approach combining YOLO (You Only Look Once) object detection models with a Convolutional Block Attention Module (CBAM) to simultaneously detect and classify multiple pavement distress types. The model demonstrates robust performance in detecting and classifying potholes, longitudinal cracks, alligator cracks, and raveling, with confidence scores ranging from 0.46 to 0.93. While some misclassifications occur in complex scenarios, these provide insights into unique challenges of pavement assessment in developing countries. Additionally, we developed a web-based application for real-time distress detection from images and videos. This research advances automated pavement distress detection and provides a tailored solution for developing countries, potentially improving road safety, optimizing maintenance strategies, and contributing to sustainable transportation infrastructure development.
- New
- Research Article
- 10.3390/jimaging12020069
- Feb 6, 2026
- Journal of Imaging
- Qi Mi + 4 more
Unmanned aerial vehicles (UAVs) are now widely used in various applications, including agriculture, urban traffic management, and search and rescue operations. However, several challenges arise, including the small size of objects occupying only a sparse number of pixels in images, complex backgrounds in aerial footage, and limited computational resources onboard. To address these issues, this paper proposes an improved UAV-based small object detection algorithm, YOLO11s-UAV, specifically designed for aerial imagery. Firstly, we introduce a novel FPN, called Content-Aware Reassembly and Interaction Feature Pyramid Network (CARIFPN), which significantly enhances small object feature detection while reducing redundant network structures. Secondly, we apply a new downsampling convolution for small object feature extraction, called Space-to-Depth for Dilation-wise Residual Convolution (S2DResConv), in the model’s backbone. This module effectively eliminates information loss caused by pooling operations and facilitates the capture of multi-scale context. Finally, we integrate a simple, parameter-free attention module (SimAM) with C3k2 to form Flexible SimAM (FlexSimAM), which is applied throughout the entire model. This improved module not only reduces the model’s complexity but also enables efficient enhancement of small object features in complex scenarios. Experimental results demonstrate that on the VisDrone-DET2019 dataset, our model improves mAP@0.5 by 7.8% on the validation set (reaching 46.0%) and by 5.9% on the test set (increasing to 37.3%) compared to the baseline YOLO11s, while reducing model parameters by 55.3%. Similarly, it achieves a 7.2% improvement on the TinyPerson dataset and a 3.0% increase on UAVDT-DET. Deployment on the NVIDIA Jetson Orin NX SUPER platform shows that our model achieves 33 FPS, which is 21.4% lower than YOLO11s, confirming its feasibility for real-time onboard UAV applications.
- New
- Research Article
- 10.1038/s41598-026-37636-2
- Feb 6, 2026
- Scientific reports
- Jong-Won Baek + 3 more
Invasive freshwater turtles are major drivers of biodiversity loss, underscoring the importance of early detection and management. However, it is challenging for experts to manually monitor a broad geographic area, necessitating support tools. Deep learning-based object detection models have displayed high performance in automating wildlife monitoring tasks. Furthermore, hyperparameter optimization, including optimizer selection and hyperparameter tuning, might further enhance performance by optimizing training settings to the dataset. In this study, an optimized model was developed to apply hyperparameter optimization to detect and classify six invasive turtle species in Korea from images. The optimized model was compared to a default model trained using the default optimizer and hyperparameters. The optimized model outperformed the default model, as indicated by the evaluations of mean average precision using a fixed intersection over union threshold of 0.5 (0.973 vs. 0.959) and a range of thresholds ranging from 0.5 to 0.95 (0.841 vs. 0.815). The classification accuracy of the optimized model reached 92.7%, exceeding that of the default model (89.9%). These findings highlight the utility of hyperparameter optimization and suggest that the proposed approach can support the early detection of invasive turtles, thereby enhancing to invasive species management.
- New
- Research Article
- 10.1038/s41598-026-37052-6
- Feb 5, 2026
- Scientific reports
- Zeran Wang + 5 more
Object detection, a cornerstone of computer vision, aims to localize and classify objects within images. This comprehensive survey reviews modern object detection methods, focusing on two dominant paradigms: Convolutional Neural Networks (CNNs) and Transformer-based architectures. This work provides a structured comparison of CNN-based and Transformer-based detection paradigms, highlighting their complementary strengths and trade-offs. CNNs demonstrate advantages in local feature extraction and computational efficiency, whereas Transformers excel at capturing global context through self-attention mechanisms. We also analyze multi-modal fusion techniques integrating Red-Green-Blue (RGB), Light Detection and Ranging (LiDAR), and language embeddings. Benchmark results from representative models include: Real-Time Detection Transformer (RT-DETR) achieves 53.1% mean Average Precision (mAP) at Intersection over Union (IoU) at 0.5:0.95, You Only Look Once version 8 (YOLOv8) achieves 50.2% mAP at 0.5:0.95, real-time detectors exceed 100 frames per second (FPS) with competitive accuracy, and specialized infrared methods achieve 92.45% F-measure on NUAA-SIRST dataset. The work introduces a novel taxonomy of multi-modal fusion strategies, documents field-wide and review-specific limitations, and synthesizes recent 2024 to 2025 benchmarks across diverse datasets. Despite these advances, significant challenges remain in handling scale variation, occlusion effects, and domain adaptation. This survey outlines these persistent obstacles and promising research directions, providing a structured reference for researchers and practitioners.
- New
- Research Article
- 10.1145/3786764
- Feb 5, 2026
- ACM Transactions on Internet of Things
- Hem Regmi + 3 more
We present MiHazeFree3D , a system that leverages millimeter-wave (mmWave) radar signals to predict 3D bounding boxes of vehicles and pedestrians in real-world traffic scenarios. While current 3D object detection methods rely primarily on cameras and LiDARs, their performance degrades significantly in rain, fog, or poor lighting conditions. Our system exploits mmWave radar’s ability to operate reliably in these challenging conditions, offering a complement to existing sensors without increasing computational costs. The key challenge in using mmWave for 3D detection lies in handling motion-induced errors and the specular reflection of mmWave signals. To address these issues, we developed a deep learning architecture with multiple feature fusion layers and trained it on diverse real-world scenarios. We evaluated MiHazeFree3D using data collected from mmWave radars mounted on the dashboard of an ego-vehicle driving through urban environments. Our results show that MiHazeFree3D detects and bounds both vehicles and pedestrians in tested conditions, including fog and low-light scenarios, highlighting the potential of mmWave radar for 3D object detection in autonomous driving systems.
- New
- Research Article
- 10.36948/ijfmr.2026.v08i01.67518
- Feb 4, 2026
- International Journal For Multidisciplinary Research
- Praneeth N + 3 more
Fisheye cameras are widely used in autonomous driving, traffic surveillance, parking assistance and indoor monitoring because they can capture a very wide field of view with a single lens. However, the strong radial distortion in fisheye images makes object detection much more difficult than in normal pinhole images. Traditional detectors trained on regular datasets often fail near the image borders, where objects look stretched, curved, or very small. In recent years, many researchers have proposed improved versions of YOLO and other deep learning models to solve these issues and to increase accuracy and robustness in fisheye scenarios. This paper presents a review of such methods, including attention mechanisms, contrastive learning, distortion-aware feature extraction and new bounding box designs. Important fisheye datasets like WoodScape and FishEye8K are discussed, along with benchmark results, evaluation metrics and open challenges. The aim is to give students and beginner researchers a simple, clear view of how improved YOLO-based approaches work for fisheye images, what performance they achieve and where more work is still needed.
- New
- Research Article
- 10.36948/ijfmr.2026.v08i01.67671
- Feb 4, 2026
- International Journal For Multidisciplinary Research
- Shweta Suryawanshi + 3 more
Underwater object detection finds its paramount importance in various applications like marine exploration, ocean monitoring, and underwater surveillance. However, underwater images usually pose challenges such as low visibility, light scattering, color attenuation, and noise that seriously deteriorate the detection performance. An intelligent underwater object detection approach is proposed by using a hybrid Convolutional Neural Network-Support Vector Machine model to handle these issues. In this proposed method, the CNN is used for automatic deep feature extraction from underwater images while a support vector machine classifies those features with robustness. The hybrid CNN-SVM framework effectively unifies the feature learning capability of the deep learning methods and the strong generalization ability of the machine learning methods. The proposed approach has been evaluated on the available, publicly shared underwater image datasets containing fish, corals, underwater vegetation, rocks, and man-made objects. Experimental results show that the CNN-SVM model can achieve higher accuracy, precision, recall, and F1 score as compared to stand-alone CNN and transfer learning models like VGG19. This demonstrates that the proposed system is reliable, robust, and well-suited for underwater object detection under challenging environmental conditions.
- New
- Research Article
- 10.1142/s0218001426590135
- Feb 4, 2026
- International Journal of Pattern Recognition and Artificial Intelligence
- Junwei Li + 5 more
The fault of the power grid transmission line itself or the foreign matter caught in it will pose a potential threat to the power system. Efficient anomaly detection is the key to maintain the stability of modern transmission systems. At present, the increasing demand for edge computing equipment makes it a trend to develop lightweight and efficient power grid anomaly detection methods. To deal with the practical demands of power grid anomaly detection, this paper brings the LPGANet, a lightweight model designed to achieve high accuracy while enhancing detection efficiency. The model is integrated with dynamic snake convolution (DSConv) and spatial-channel reconstruction convolution (SCConv) to increase multi-scale feature extraction and fusion and cut computational cost. In addition, An EMA method is adopted to enhance the concentration on foreground and reduce the impact of background. We release a new dataset containing 6,200 images of typical power grid anomalies such as broken strands, scattered strands, and other floating suspensions. The experimental results on the dataset demonstrate that LPGANet achieves the best accuracy, highest efficiency, and best comprehensive performance compared with other object detection methods. In addition, the effectiveness of the system under computational resource limitation is also verified in the deployment on Jetson AGX Orin edge devices.
- New
- Research Article
- 10.1038/s41598-026-38378-x
- Feb 4, 2026
- Scientific reports
- Hong Shi + 6 more
To address the problem of insufficient detection accuracy in drone aerial and remote sensing images due to factors such as small target size, background interference, and occlusion, we propose a lightweight small object detection model, MDI-YOLO, based on multi-dimensional feature fusion of Transformer and CNN. In the MDI-YOLO model, we utilize a channel grouping strategy that combines the advantages of Transformer and CNN, proposing the C2f-Multi-Head Self-Attention-Convolutional Gated Linear Unit-Convolutional Neural Network (C2f-MCC) structure to improve the ability of the You Only Look Once v8 (YOLOv8) backbone network in extracting global features. Additionally, we propose Directional Fusion Attention (DFA), an attention mechanism that focuses on spatial and channel features across different dimensions, enhancing the model's feature representation ability. Finally, we design the Inner-Shape-Intersection over Union (Inner-Shape-IoU) loss function, which thoroughly evaluates the bounding boxes by considering their shape, scale, and position, thereby improving the model's precision in locating objects. The findings from the experiments reveal that the proposed detection model improves mAP@0.5 by 4% and mAP@0.5:0.95 by 2.5% on the VisDrone2019 dataset with nearly the number of parameters remaining unchanged. On the DOTAv1.0 dataset, mAP@0.5 is increased by 3.3%, and mAP@0.5:0.95 by 2.8%. The improved detection model not only enhances recognition accuracy but also maintains lightweight characteristics, making it suitable for drone aerial and remote sensing image detection, and strengthening the network's robustness and generalization ability.
- New
- Research Article
- 10.3390/app16031551
- Feb 3, 2026
- Applied Sciences
- Nan Ji + 2 more
Yunnan Jiama (paper horse prints), a representative form of intangible cultural heritage in southwest China, is characterized by subtle inter-class differences, complex woodblock textures, and heterogeneous preservation conditions, which collectively pose significant challenges for digital preservation and automatic image classification. To address these challenges and improve the computational analysis of Jiama images, this study proposes an enhanced object detection framework based on YOLOv8 integrated with a Global Attention Mechanism (GAM), referred to as YOLOv8-GAM. In the proposed framework, the GAM module is embedded into the high-level semantic feature extraction and multi-scale feature fusion stages of YOLOv8, thereby strengthening global channel–spatial interactions and improving the representation of discriminative cultural visual features. In addition, image augmentation strategies, including brightness adjustment, salt-and-pepper noise, and Gaussian noise, are employed to simulate real-world image acquisition and degradation conditions, which enhances the robustness of the model. Experiments conducted on a manually annotated Yunnan Jiama image dataset demonstrate that the proposed model achieves a mean average precision (mAP) of 96.5% at an IoU threshold of 0.5 and 82.13% under the mAP@0.5:0.95 metric, with an F1-score of 94.0%, outperforming the baseline YOLOv8 model. These results indicate that incorporating global attention mechanisms into object detection networks can effectively enhance fine-grained classification performance for traditional folk print images, thereby providing a practical and scalable technical solution for the digital preservation and computational analysis of intangible cultural heritage.
- New
- Research Article
- 10.1109/mcg.2026.3660508
- Feb 3, 2026
- IEEE computer graphics and applications
- Isac Holm + 3 more
The need for large, high-quality annotated datasets continues to represent a primary limitation in training Object Detection (OD) models. To mitigate this challenge, we present VILOD, a Visual Interactive Labeling tool that integrates Active Learning (AL) with a suite of interactive visualizations to create an effective Human-in-the-Loop (HITL) workflow for OD annotation and training. VILOD is designed to make the AL process more transparent and steerable, empowering expert users to implement diverse, strategically guided labeling strategies that extend beyond algorithmic query strategies. Through comparative case studies, we evaluate three visually guided labeling strategies against a conventional automated AL baseline. The results show that a balanced, human-guided strategy-leveraging VILOD's visual cues to synthesize information about data structure and model uncertainty-not only outperforms the automated baseline but also achieves the highest overall model performance. These findings emphasize the potential of visually guided, interactive annotation to enhance both the efficiency and effectiveness of dataset creation for OD.
- New
- Research Article
- 10.1007/s11571-025-10407-x
- Feb 3, 2026
- Cognitive neurodynamics
- Ziyue Yang + 6 more
Camouflaged Object Detection (COD), the task of identifying objects concealed within their environments, has seen rapid growth due to its wide range of practical applications. We propose a human-machine collaboration framework for COD, leveraging the complementary strengths of computer vision (CV) models and noninvasive brain-computer interfaces (BCIs). Our approach introduces a multiview backbone to estimate uncertainty in CV predictions, utilizes this uncertainty during training to improve efficiency, and defers low-confidence cases to human evaluation via RSVP-based BCIs during testing for more reliable decision-making. Evaluated on the CAMO dataset, our framework achieves state-of-the-art results with an average improvement of 4.56% in balanced accuracy (BA) and 3.66% in the F1 score. For the best-performing participants, improvements reached 7.6% in BA and 6.66% in the F1 score. Training analysis showed a strong correlation between confidence and precision, while ablation studies confirmed the effectiveness of our training policy and human-machine collaboration strategy. This work reduces human cognitive load, improves system reliability, and provides a foundation for advancements in real-world COD applications and human-computer interaction. Our code and data are available at: https://github.com/ziyuey/Uncertainty-aware-human-machine-collaboration-in-camouflaged-object-identification.
- New
- Research Article
- 10.1162/neco.a.1483
- Feb 2, 2026
- Neural computation
- Faris B Rustom + 3 more
Object detection and recognition are fundamental functions that play a significant role in the success of species. Because the appearance of an object exhibits large variability, the brain has to group these different stimuli under the same object identity, a process of generalization. Does the process of generalization follow some general principles, or is it an ad hoc bag of tricks? The universal law of generalization (ULoG) provides evidence that generalization follows similar properties across a variety of species and tasks. Here, we tested the hypothesis derived from ULoG that the internal representations underlying generalization reflect the natural properties of object detection and recognition in our environment rather than the specifics of the system solving these problems. Neural networks with universal-approximation capability have been successful in many object detection and recognition tasks; however, how these networks reach their decisions remains opaque. To provide a strong test for ecological validity, we used natural camouflage, which is nature's test bed for object detection and recognition. We trained a deep neural network with natural images of "clear" and "camouflaged" animals and examined the emerging internal representations. We extended ULoG to a realistic learning regime, with multiple consequential stimuli, and developed two methods to determine category prototypes. Our results show that with a proper choice of category prototypes, the generalization functions are monotone decreasing, similar to the generalization functions of biological systems. Critically, we show that camouflaged inputs are not represented randomly but rather systematically appear at the tail of the monotone decreasing functions. Our results support the hypothesis that the internal representations underlying generalization in object detection and recognition are shaped mainly by the properties of the ecological environment, even though different biological and artificial systems may generate these internal representations through drastically different learning and adaptation processes. Furthermore, the extended version of ULoG provides a tool to analyze how the system organizes its internal representations during learning as well as how it makes its decisions.
- New
- Research Article
- 10.1016/j.neunet.2025.108077
- Feb 1, 2026
- Neural networks : the official journal of the International Neural Network Society
- Haoke Xiao + 4 more
Disentangled self-supervised video camouflaged object detection and salient object detection.
- New
- Research Article
1
- 10.1016/j.neunet.2025.108174
- Feb 1, 2026
- Neural networks : the official journal of the International Neural Network Society
- Xinmiao Gao + 2 more
Domain adaptive underwater object detection via complementary style-aware learning.