Optimization of multi-objective recognition based on video tracking technology

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Abstract Considering the shortcomings of traditional video multi-target recognition technology in rapidly identifying criminal suspects in complex scenes, a multi-target recognition optimization method based on video tracking technology is proposed. This method constructs a multi-target recognition algorithm based on video feature matching, and introduces the Kalman filter algorithm to improve the accuracy and real-time recognition of criminal suspects through the definition of feature vector and similarity function. Experiments showed that the model proposed in the study performed exceptionally well in terms of tracking error; the highest precision was 94.75%, the recall rate was 96.59%, the tracking error of the horizontal axis was only 3.75%, and the tracking error of the vertical axis was 3.27%. In the crime detection video application, the accuracy–recall curve of the model was 0.94, and the feature recall rate was 94.83%, verifying the effectiveness and robustness of the model in complex and fast scenes. The results show that the proposed model has good feasibility and robustness in rapidly identifying criminal suspects. In addition, the work offered new technical concepts for improving target tracking precision and adapting to real-time scene changes, opening new research avenues in the field of multi-target recognition.

Similar Papers
  • Research Article
  • Cite Count Icon 3
  • 10.1007/s42452-025-07116-9
Football sports video tracking and detection technology based on YOLOv5 and DeepSORT
  • May 29, 2025
  • Discover Applied Sciences
  • Bin Wang

Enhancing the analysis of football sports video is of great practical significance and commercial value for tactical analysis, player performance evaluation, tournament broadcasting, and many other aspects. Considering that the current target detection and tracking is characterized by complex scene changes, target occlusion, and obvious external motion interference, the study proposes to improve the YOLOv5 and DeepSORT algorithms for improving the tracking and detection accuracy of sports video to enhance its application performance. First, the model is improved with lightweight network architecture and attention mechanism is introduced to improve feature extraction capability and target detection accuracy. After that, a traceless Kalman filter is introduced into the DeepSORT algorithm to improve the target matching performance and enhance the target tracking. The outcomes indicated that the average accuracy value of the improved YOLOv5 model for target detection was more than 90%, which effectively reduced the number of computational parameters. The detection performance under target overlap and uneven lighting and shadows exceeded 90%, and the difference between the algorithm and other algorithms was at least greater than 2%. When performing target tracking, the AUC values of the research algorithm in different scenarios have exceeded 85%, which is less affected by the overlap threshold and has a high tracking accuracy. It demonstrated the highest successful tracking rate and showed a more stable performance.

  • Conference Article
  • Cite Count Icon 5
  • 10.1109/icce.2014.6776037
Level-of-detail AR: Managing points of interest for attentive augmented reality
  • Jan 1, 2014
  • Min-Hyuk Sung + 3 more

In this paper, we present level-of-detail (LOD) augmented reality (AR), which is a novel approach to handling multi-layered information of the target image. Previously, multitarget recognition and tracking methods were used to handle augmentation in a complex scene. In more complex situations, when the target can be divided into depth-based layers, it is not feasible to simply employ multi-target methods. To overcome this problem, we propose a tree structure of points of interest (POI) and a practical method that identifies the parts that attract maximum user attention. We demonstrate the feasibility of our approach by implementing a mobile LOD AR system that handles very large targets that are commonly encountered in real-world situations such as in museums.

  • Research Article
  • Cite Count Icon 2
  • 10.1109/tgrs.2025.3628639
Difference Enhancement and Inter-Scale Interactive Fusion Mamba for Remote Sensing Image Change Detection
  • Jan 1, 2025
  • IEEE Transactions on Geoscience and Remote Sensing
  • Weiwei Sun + 5 more

Recently, Mamba has made significant strides in sequence modeling, with its global receptive field, dynamic weighting strategy and linear growth in computational complexity. In remote sensing (RS) change detection (CD), several studies have demonstrated that Mambas leverage a unique scanning mechanism to traverse images from various directions, showcasing excellent long-range modeling capabilities. However, as the network depth increases, Mamba often struggle to retain shallow textures and local features effectively. In particular, modern RS images frequently capture complex surface scenes, including seasonal climate variations and densely built environments, making local contextual details crucial for effective CD. Therefore, a difference enhancement and inter-scale interactive fusion Mamba (DEIF-Mamba) is proposed to alleviate the issue. This entire network framework integrates CNN and Mamba, utilizing CNN to capture local feature information, while Mamba employs a cross-scanning mechanism to integrate global information. To address the interference caused by mixed texture features and the missed detection of subtle changes in complex scenes, a differential feature enhancement module (DFEM) is proposed to enrich local contextual details and improve feature representation. In addition, we propose an inter-scale interactive fusion (ISIF) strategy to fully utilize the cross-scale interactive information and minimize information redundancy. Extensive experiments on four CD datasets demonstrate that the proposed DEIF-Mamba achieves an average F1 of 85.87%, and shows superior performance compared with other state-of-the-art (SOTA) methods. Code will be available online (https://github.com/Jyl199904/DEIF-Mamba).

  • Research Article
  • Cite Count Icon 11
  • 10.1364/josaa.29.00a174
A chromatic diversity index based on complex scenes
  • Jan 26, 2012
  • Journal of the Optical Society of America A
  • JoĂŁo Manuel Maciel Linhares + 1 more

We propose a chromatic diversity index based on the Munsell set capable of predicting illuminant induced changes in chromatic diversity of complex scenes. The color differences between complex scenes derived from hyperspectral data under a test and under a reference CIE D65 illuminant were computed and compared with the corresponding differences for the Munsell set. It was found that the average color difference between the complex scenes correlates well with the color differences of the Munsell samples with an average correlation of about 0.94, a result indicating that the Munsell set can be used to predict chromatic changes in complex scenes.

  • Research Article
  • 10.1142/s0219265922420038
Mobile Big Data Analytics for Human Behavior Recognition in Wireless Sensor Network Based on Transfer Learning
  • Jan 4, 2023
  • Journal of Interconnection Networks
  • Zhexiong Cui + 1 more

Big data analysis of human behavior can provide the basis and support for the application of various scenarios. Using sensors for human behavior analysis is an effective means of identification method, which is very valuable for research. To address the problems of low recognition accuracy, low recognition efficiency of traditional human behavior recognition (HBR) algorithms in complex scenes, in this paper, we propose an HBR algorithm for Mobile Big data analytics in wireless sensor network using improved transfer learning. First, different wireless sensors are fused to obtain human behavior mobile big data, and then by analyzing the importance of human behavior features (HBF), the dynamic change parameters of HBF extraction threshold are calculated. Second, combined with the dynamic change parameters of threshold, the HBF of complex scenes are extracted. Finally, the best classification function of human behavior in complex scenes is obtained by using the classification function of HBF in complex scenes. Human behavior in complex scenes is classified according to the HBF in the feature set. The HBR algorithm is designed by using the improved transfer learning network to realize the recognition of human behavior in complex scenes. The results show that the proposed algorithm can accurately recognize up to 22 HBF points, and can control the HBR time within 2 s. The human behavior false recognition rate of miscellaneous scenes is less than 10%. The recognition speed is above 10/s, and the recall rate can reach more than 98%, which improves the HBR ability of complex scenes.

  • Research Article
  • Cite Count Icon 24
  • 10.1016/j.image.2009.03.003
A two-pass rate control algorithm for H.264/AVC high definition video coding
  • Mar 29, 2009
  • Signal Processing: Image Communication
  • Dongdong Zhang + 2 more

A two-pass rate control algorithm for H.264/AVC high definition video coding

  • Research Article
  • 10.1117/1.jei.35.2.023008
Improved brain-inspired DeeplabV3+ algorithm for complex scenes
  • Mar 11, 2026
  • Journal of Electronic Imaging
  • Huan Liu + 4 more

In response to the issues of spatial detail loss and insufficient feature extraction in the existing DeeplabV3+ model of complex scenes, we employ the visual processing mechanisms of the human brain and propose an improved brain-inspired DeeplabV3+ algorithm for complex scenes. First, MobileNetV2 is used to replace Xception as the backbone network, and the feature extraction process is further optimized by the coordinate attention model so that the model can capture spatial information more efficiently and avoid the loss of details in complex scenes. Second, the continuous-coupled neural network model is applied after each atrous convolution, and the characteristics of a brain-like neural network are used to capture more spatial context information and enhance the feature extraction ability of the model. Finally, the simple attention module is added after the shallow features of the backbone network, the extraction ability of key features is further enhanced by weighting the input features, and the clarity of the segmentation boundary is improved. To rigorously evaluate the performance of the proposed algorithm, we conduct extensive experimental validation on two publicly available datasets: the PASCAL VOC2012 and DLRSD datasets. The experimental results show that the mean intersection over union (mIoU) of the proposed algorithm on the PASCAL VOC2012 dataset reaches 73.75%, which is 2.37% higher than that of the original DeeplabV3+ algorithm, and the mIoU on the DLRSD dataset reaches 64.88%, that is, an increase of 3.36%. At the same time, the mean pixel accuracy achieves 83.6% and 80.8%, respectively, demonstrating significant superiority compared with other classical algorithms. In addition, ablation experiments verify the fusion effectiveness of the improved modules and prove that the proposed algorithm can effectively improve the segmentation accuracy and detail performance in complex scenes. We not only provide a new solution for image semantic segmentation in complex scenes but also contribute an important theoretical and practical reference for the design of lightweight and high-precision semantic segmentation models.

  • Conference Article
  • 10.1145/3007669.3007675
Scene Adaptive Object Tracking Combining Local Feature and Color Feature
  • Aug 19, 2016
  • Quan Miao + 1 more

Scene changes like scale, rotation, illumination and occlusion often occur in video sequences, which raise challenges to robust object tracking. This paper presents a new on-line object tracking method adapting to different scene changes, by combining local feature and color feature. First, object tracking is treated as a keypoint matching problem. SURF features are detected, described and further categorized according to different scene changes and undergo dynamic clustering. In addition, color feature is constructed to better choose the image domain for matching. Online updating is performed on SURF feature and color feature once tracking is successful. Experimental results validate the robustness and accuracy of the proposed method under complex scene changes.

  • Research Article
  • Cite Count Icon 2
  • 10.25103/jestr.135.05
Long-term Object Tracking Based on Improved Continuously Adaptive Mean Shift Algorithm
  • Jan 1, 2020
  • Journal of Engineering Science and Technology Review
  • Jinping Sun + 4 more

Long-term object tracking encounters complex scene changes, such as deformation, short-term departure from sight, occlusion, and lighting changes, resulting in complex and unstable tracking. To improve the accuracy and success rate of long-term object tracking in complex scenes, an improved continuously adaptive mean shift (CAMShift) algorithm was proposed. The joint probability density distribution of the target model was obtained by using the Bhattacharrya coefficient to calculate the contribution of the color features and texture features. Combining with the fused target model and Kalman filter, the target position was obtained by implementing CAMShift algorithm. Finally, a template pool was designed to store high-confidence tracking results. The target template was updated online by retrieving the initial frame from the template pool to recover re-detection after tracking drift or failure. The accuracy of the proposed algorithm was verified by simulation analysis. Results show that the distance precision and success rate of the proposed algorithm are 0.9 and 0.83, respectively. The proposed algorithm effectively solves long-term target tracking problems affected by complex scenes, such as occlusion, similar colors, and deformation. This study provides references for the automatic detection of traffic incidents in the intelligent traffic monitoring system.

  • Research Article
  • Cite Count Icon 107
  • 10.1016/j.compag.2019.104982
FLYOLOv3 deep learning for key parts of dairy cow body detection
  • Sep 12, 2019
  • Computers and Electronics in Agriculture
  • Bo Jiang + 5 more

FLYOLOv3 deep learning for key parts of dairy cow body detection

  • Research Article
  • 10.5194/isprs-archives-xlviii-4-w17-2025-317-2026
Solar Panel Segmentation in High-Resolution Satellite Imagery: A YOLOv8-GIS Approach in the Marrakech-Safi Region, Morocco
  • Jan 15, 2026
  • The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
  • Mohamed Smouni + 3 more

Abstract. Green energy usage in Morocco is gaining traction, particularly in the realm of solar panels, which hold great potential for use in agriculture and residential settings. Recently, there has been growing interest in exploring ways to automatically gather important information about solar installations in specific geographic areas of interest. To address this goal, we developed a geoAI approach that utilizes satellite high-resolution imagery and the YOLOv8 computer vision algorithm for accurate solar panel segmentation in the Marrakech-Safi region of Morocco. Training images were obtained from open-source, annotated datasets available on the web, and we pseudo-labeled images from our Area of Interest using a semi-supervised learning approach. We built, trained, and tested the solar panel dataset, which included 4660 images. Subsequently, we performed geoprocessing analysis to extract estimated geometric parameters such as the area, perimeter, and angles of the segmented solar panels. These shape parameters were then employed in unsupervised machine learning to detect anomalies in the segmented data by using the Isolation Forest algorithm. Precision, recall rate, and mAP50 were used for the evaluation of the Yolov8 segmentation model. The results showed a high precision rate of 96.9%, a recall rate of 97.6%, and an mAP score of 0.99, indicating the effectiveness of the Yolov8 segmentations in accurately segmenting solar panels. Our approach successfully segmented 18,050 PV modules, covering an estimated area of 1.47 km2 in the study area, with an average confidence of 89%. This demonstrates the model's capability to accurately identify and isolate solar panels within complex scenes. The high precision and recall rates suggest that our approach is robust for large-scale solar panel detection in diverse landscapes. Successfully segmenting over 18,000 PV modules indicates the scalability of our method. Additionally, integrating geoprocessing analysis and the Isolation Forest algorithm enhances our approach, allowing for the identification of anomalies in solar panel installations. This research provides valuable insights into the extent of solar panel adoption in the Marrakech-Safi region, offers a robust methodology for large-scale solar installation mapping, and establishes a foundation for future nationwide studies, potentially informing energy policies and supporting sustainable development initiatives across Morocco.

  • Research Article
  • Cite Count Icon 1
  • 10.1364/oe.562136
Shape from polarization via a physical prior-based deep fusion network with ambiguous surface normals.
  • Jun 9, 2025
  • Optics express
  • Baolin Wang + 3 more

Shape from polarization imaging is a passive three-dimensional imaging method with high precision and detailed reconstruction capabilities, widely used in fields such as intelligent manufacturing, surface defect detection, and medical imaging. However, existing deep learning-based shape from polarization methods have not effectively fused polarization and other physical prior information, leading to degraded reconstruction quality in complex scenes. To address this issue, we propose an innovative method that integrates polarization information with ambiguous surface normals and specular confidence information. By incorporating these additional prior features, the model's reconstruction accuracy in complex scenes is significantly improved. This research calculates the ambiguous surface normals and specular confidence information based on a physical model and introduces a novel dual-branch deep fusion network. The network efficiently extracts multi-scale feature information through a feature extraction module and effectively fuses polarization information with ambiguous surface normals and specular confidence information via a feature fusion module, enhancing the reconstruction accuracy in complex scenes. Experimental results demonstrate that the proposed method can accurately reconstruct surface normal under complex lighting and low-texture conditions, significantly improving the accuracy and robustness of shape from polarization methods. The method is expected to have broad applications in intelligent manufacturing, surface defect detection, and medical imaging. Our dataset and source code will be publicly available at https://github.com/singobl/CGA-Transformer.

  • Book Chapter
  • 10.1007/978-981-99-2092-1_30
Sports Video Tracking Technology Based on Optimized Decision Tree Algorithm(DTA)
  • Jan 1, 2023
  • Zhong Wu

With the continuous progress of society, video surveillance system has been more and more used in various occasions, and is gradually developing in the direction of intelligence. Moving target detection and tracking has always been the key to the intelligence of video surveillance. This paper analyzes the research of optimal DTA in sports video tracking technology(VTT); In terms of moving target tracking, firstly, the existing classification of moving target tracking and common moving target tracking methods are summarized, and then the improved epanechnikov kernel function target tracking algorithm is introduced, and the optimized DTA is introduced into the tracking technology. Through the experimental data, it is found that among the six groups of data tested, the accuracy of the optimized DTA has reached more than 75%, and the lowest recall has reached 77.45%. It is proved that the optimized DTA has good comprehensive performance and high precision in sports VTT.

  • Research Article
  • Cite Count Icon 30
  • 10.1177/0278364911399340
Scene parsing using a prior world model
  • Jun 3, 2011
  • The International Journal of Robotics Research
  • Gregory D Hager + 1 more

We present a new paradigm for constructing a 3D model of a scene from images. Our approach makes strong use of a prior 3D model of the scene. Changes from scene to scene are regarded as a Markov dynamical system, which is described by a probabilistic transition model. From the prior 3D scene model, the model of scene change dynamics, and a newly acquired image, we compute the new 3D scene model which is most consistent with the observed image and the changes from the prior model. The use of a prior 3D scene model allows the method to deal with complex scenes, maintain hidden state, respect object persistence, perform object segmentation, and provides computational efficiencies. In this paper we formalize a mathematical framework for physically consistent 3D scene models, and changes to scene models that preserve physical consistency. From this framework, we first derive a generic scene model optimization algorithm for the general 3D scene interpretation problem, and we then present a polynomial time approximation for this algorithm. We detail the implementation of the algorithm for range images computed by stereo imaging, and present extensive experimental results on sequences of scenes containing dozens of objects and multiple changes from scene to scene.

  • Research Article
  • Cite Count Icon 12
  • 10.1097/01.wnr.0000223390.36457.b4
Is there a mismatch negativity during change blindness?
  • Jul 17, 2006
  • NeuroReport
  • Ross M Henderson + 1 more

The mismatch negativity is an event-related potential that represents a preattentive change detection process. The aim of this study was to determine whether the mismatch negativity was present during 'change blindness', a striking phenomenon in which surprisingly large changes in a complex scene are not seen when they occur during a blink or an eye movement. In this study, large orientation changes elicited a candidate mismatch negativity between 180 and 320 ms that appeared to be independent of participants' performance (uncued 76% correct, miscued 59% correct with chance performance at 50%). This negativity, however, disappeared in the miscued 'change blind' condition. In conclusion, the mismatch negativity does not appear to be present during change blindness suggesting that in complex scenes even large changes may not trigger preattentive change detection processes.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant