Improved algorithm for tracking an object of one of the several predefined types
A new neural network algorithm for tracking objects observed in frames of video has been developed. The algorithm enables automatic detection of objects of one of the predefined types, reliable subsequent tracking, rapid redetection of the object if tracking was interrupted, and detection of a different object of the desired type if the tracked object disappears. Detection of the object of interest in video frames is performed using a neural network detector, and tracking is carried out by the developed algorithm using a neural network transformer.
- Research Article
- 10.37661/1816-0301-2025-22-1-66-72
- Mar 31, 2025
- Informatics
Objectives. The article presents the results of calculation and comparative analysis of the characteristics of the algorithm proposed by the authors in [1] for tracking an object captured by a video camera, when solving the urgent task of automatic detection and tracking of drones. Two algorithms were selected for comparative analysis, one of which is the currently known open source ByteTrack tracker, and the other is a simple tracker based on the use of the neural network, correlation comparison together with Kalman filter. The first tracker was chosen because it can be implemented in C++ without using third-party libraries and frameworks and used on small computers in real time. The second tracker was used to determine how much better new trackers are than simple, long-used ones. The specificity of the used algorithms is automatic detection and capture of the drone, its further reliable tracking, quick repeated capture in case of tracking failure, capture of another drone when the tracked object disappears. In the used trackers, drone detection in video frames is carried out using a neural network detector, and tracking is done with the help of the neural network detector and developed tracking algorithms.Methods. To perform a comparative analysis of object tracking algorithms, two datasets consisting of video frames that contain drone images were created and labeled. The training dataset consists of 36895 frames whereas testing one contains 8678 images. The videos of the training and test datasets were shot with different cameras in different conditions. To train the neural network part of the trackers, versions of the algorithms were written in the Python programming language, and to calculate and analyze characteristics in conditions close to real ones, in C++, which required converting the trained network using the TensorRT framework. Software tools for gathering and processing experimental data were also implemented.Results. The comparative analysis of three object tracking algorithms allowed us to calculate and compare the characteristics of these trackers, as well as draw conclusions about the method of training the used neural network detector; about the possibility of using trackers in real time on budget personal computers with budget video cards that have the CUDA software and hardware architecture, about the applicability of two of them for solving the problem of practical tracking of drones observed by video cameras with sufficient accuracy and reliability. Of the three tested algorithms the tracker previously developed by the authors has the best characteristics.Conclusion. The comparative analysis of the above-mentioned trackers showed the possibility of practical application of the tracker and the ByteTrack algorithm for solving the problem of tracking drones, however, there is still a problem with detecting small-sized unmanned aerial vehicles.
- Research Article
2
- 10.29235/1561-8323-2024-68-2-105-111
- Apr 29, 2024
- Doklady of the National Academy of Sciences of Belarus
An algorithm for tracking an object observed on video frames is presented. The specific feature of the constructed algorithm is the automatic detection and capture of an object of one of predetermined types, its further reliable tracking, the rapid re-capture of the tracked object in the case of a failure of tracking, the capture of another object of desired type if the tracked object disappears. An object of interest on video frames is detected using a neural network detector, whereas tracking is performed by the developed algorithm.
- Research Article
- 10.18522/2311-3103-2020-1-188-199
- Mar 1, 2020
- IZVESTIYA SFedU. ENGINEERING SCIENCES
The article explores modern neural network architectures for the automatic detection and recognition of marine surface objects and obstacles of given classes throughout the full image area, applicable for execution in real or near real time on an optoelectronic vision system to au-tomate and improve the safety of civil marine navigation. A formal statement of the problem of automatic detection of objects on images is given. The state-of-the-art algorithms for detecting objects in images based on use of artificial convolutional neural networks were reviewed, their comparison was made and a reasonable choice was made in favor of the most efficient neuralnetwork architecture in terms of computational complexity to recognition accuracy. The subject area is studied, as well as publicly available databases of surface objects suitable for use in the training of algorithms using artificial neural networks. The article concluded that there is insuffi-cient labeled data for training neural network algorithms, as a result of which the authors inde-pendently collected research images and video sequences, prepared and labeled the collected data containing surface marine objects and other obstacles that represent a navigation hazard for ships. Based on the selected neural network architecture, a new neural network algorithm for automatic full-frame detection and recognition of surface objects was developed, and an artificial neural network was trained using the prepared database of images of typical objects. The resulting algorithm was tested by the authors on a validation data set, the quality of its work was estimated using various metrics, and the algorithm’s performance was measured. Conclusions are made about the necessity to expand the collected database of images of typical marine objects, further steps are proposed to improve the accuracy of the developed software and algorithmic complex and its implementation to be used in a marine optoelectronic machine vision system for automa-tion and improving the safety of civil navigation.
- Video Transcripts
- 10.48448/03qe-be54
- Apr 18, 2021
- Underline Science Inc.
IntroductionWe study the low-density parity-check (LDPC) coding [1] and iterative decoding system, as signal processing for the shingled magnetic recording (SMR) [2] in two-dimensional magnetic recording (TDMR). Previously we have reported waveform equalization using a two-dimensional finite impulse response (TD-FIR) filter [3] and an inter-track interference (ITI) canceler [4] and showed the influence of ITI is reduced. Also, we have proposed a neural network detector (NND) and evaluated the performance of the first decoding by the NND [5]. In this study, the NND iteratively calculates the log-likelihood ratio (LLR) as the decoding reliability using the returned sum-product (SP) decoder [6] output sequence as well as using TD-FIR filter [3] output sequence. Furthermore, we compare the iterative decoding using an NND with a soft-output Viterbi algorithm (SOVA) detector with the signal-dependent noise predictor [7], [8].Read/write systemThe input sequence passes through a 128/130 (0, 16/8) run-length limited (RLL) encoder and a (3, 30)-regular LDPC encoder to be changed into the recording sequence and is recorded on a granular medium model [4] under the specification of 4 Tbit/inch2. In the reading process, the decoding target track and both adjacent tracks are read composed by the array head with three readers at the same time [1], [2], and the different additive white Gaussian noise (AWGN) sequence is added to each waveform as the system noise. The signal-to-noise ratio (SNRS) for the system noise at the reading point is defined as SNRS = 20log10(A/σS) [dB], where A is the positive saturation level of the waveform reproduced from an isolated magnetic transition and σS is the root-mean-square (RMS) value of the system noise in the bandwidth of the channel bit rate fc. A channel bit response including read/write (R/W) process on the intended track is equalized to the partial response class-I (PR1) target by the equalizer composed of three low-pass filters (LPFs) having cut-off frequency xh normalized by the fc and TD-FIR filter with Nt taps, where Nt is the number of taps [3] for a reader. We assume that these parameters are set to xh = 0.4 and Nt = 15. Then, the output waveform from the PR1 channel is iteratively decoded by the turbo equalization performed between an NND and an SP decoder [6]. The SP decoder also iteratively decodes using the constraint of LDPC code until the maximum iteration number isp times. Furthermore, the SP decoder returns the reliability sequence including the parity bits of LDPC code to the NND again. In this way, the turbo equalization for the target track is performed with the maximum iteration number iglobal times. After the given number of iterations in the turbo equalization, the output sequence is obtained by the posterior probability sequence except parity passing through a hard decision unit and the RLL decoder. Then, the bit error rate (BER) is calculated by comparing the input sequence with the output sequence.Neural network detectorFigure 1 shows the block diagram of the turbo equalization. In the figure, D is the delay operator for a bit interval, Nm (m = 1∼3) is the number of elements in the mth layer. We adopt N1 = 30, N2 = 10 and N3 = 1. The NND consists of the neural network, the memory, the selector, and the LLR calculator. The neural network provides outputs for 3-bit patterns in the down-track direction for the TD-FIR filter and the returned SP decoder outputs, and stores the output in the memory. In the training process by back-propagation algorithm, we set the training signal to be “1” for the target bit pattern and “0” for the others, in order to obtain connection weight sets wij(m)(n) between the ith element at the mth layer and the jth (j = 1∼Nm-1) element at the (m - 1)th layer for the nth pattern (n = 1∼8). Furthermore, the LLR calculator provides the logarithmic ratio of the maximum values for the center bit “1” and “0” from the selector.Performance evaluationFigure 2 shows the BER performances for SNRS. The marks of circle and triangle show the performances of the NND and the SOVA detector, respectively. Here, the LLR of the SOVA detector is provided by the metric of the PR1 channel, where the metric is calculated considering the external LLR obtained by the SP decoder output [5]-[7]. The turbo equalization parameters isp and iglobal adopt the optimum values for minimizing the BER in each detector. As can be seen from the figure, the system with the NND improves about 5.5 dB in the required SNR to achieve no-errors compared to the system with the SOVA detector.AcknowledgmentsThis work was supported in part by the Advanced Storage Research Consortium (ASRC). **
- Conference Article
10
- 10.1109/ijcnn.1990.137588
- Jan 1, 1990
An integrated approach using neural networks for detecting and diagnosing process failures is presented. The system, which consists of three major components, quantitative networks, qualitative networks, and inverse qualitative networks, effectively reduces the inherent ambiguity of forward-mapping neural networks by incorporating the inverse mapping neural networks, which corresponds to the mapping from the fault space to the symptom space, and identifies the most plausible case in a process. The system is tested on four kinds of possible fault groups, including novel single faults, two two-fault groups, and sensor faults. It is shown that, due to the successful integration of quantitative information and qualitative information associated with process data, the system can successfully and substantially improve the diagnostic performance without additional information
- Book Chapter
- 10.3233/apc220065
- Nov 3, 2022
Object detection is one of the most basic and central tasks in computer vision. object detection is a subset of object recognition. Its task is to find all the interested objects in the image, and determine the category and location of the objects. Object detection is widely used and has strong practical value and research prospects. Applications include face detection, pedestrian detection and vehicle detection. In recent years, with the development of convolutional neural network, significant breakthroughs have been made in object detection. This work aim to detect objects in the video frames. It detects household objects and predicts the object where it may be present. Convolutional Neural Networks (CNN) is used to detect objects in the environment. Then Resnet50 is used to classify the images into objects. Then Support vector machine (SVM) is used to train objects and stored in object database. It will be retrieved whenever neural networks sent object for verification.
- Supplementary Content
- 10.25394/pgs.11320097.v1
- Dec 6, 2019
- Figshare
An Autonomous vehicle depends on the combination of latest technology or the ADAS safety features such as Adaptive cruise control (ACC), Autonomous Emergency Braking (AEB), Automatic Parking, Blind Spot Monitor, Forward Collision Warning or Avoidance (FCW or FCA), Lane Departure Warning. The current trend follows incorporation of these technologies using the Artificial neural network or Deep neural network, as an imitation of the traditionally used algorithms. Recent research in the field of deep learning and development of competent processors for autonomous or self driving car have shown amplitude of prospect, but there are many complexities for hardware deployment because of limited resources such as memory, computational power, and energy. Deployment of several mentioned ADAS safety feature using multiple sensors and individual processors, increases the integration complexity and also results in the distribution of the system, which is very pivotal for autonomous vehicles.This thesis attempts to tackle two important adas safety feature: Forward collision Warning, and Object Detection using the machine learning and Deep Neural Networks and there deployment in the autonomous embedded platform.This thesis proposes the following: 1. A machine learning based approach for the forward collision warning system in an autonomous vehicle.2.3-D object detection using Lidar and Camera which is primarily based on Lidar Point Clouds. The proposed forward collision warning model is based on the forward facing automotive radar providing the sensed input values such as acceleration, velocity and separation distance to a classifier algorithm which on the basis of supervised learning model, alerts the driver of possible collision. Decision Tress, Linear Regression, Support Vector Machine, Stochastic Gradient Descent, and a Fully Connected Neural Network is used for the prediction purpose.The second proposed methods uses object detection architecture, which combines the 2D object detectors and a contemporary 3D deep learning techniques. For this approach, the 2D object detectors is used first, which proposes a 2D bounding box on the images or video frames. Additionally a 3D object detection technique is used where the point clouds are instance segmented and based on raw point clouds density a 3D bounding box is predicted across the previously segmented objects.
- Conference Article
4
- 10.1117/12.2277974
- Oct 5, 2017
In the field of security and defense, it is extremely important to reliably detect moving objects, such as cars, ships, drones and missiles. Detection and analysis of moving objects in cameras near borders could be helpful to reduce illicit trading, drug trafficking, irregular border crossing, trafficking in human beings and smuggling. Many recent benchmarks have shown that convolutional neural networks are performing well in the detection of objects in images. Most deep-learning research effort focuses on classification or detection on single images. However, the detection of dynamic changes (e.g., moving objects, actions and events) in streaming video is extremely relevant for surveillance and forensic applications. In this paper, we combine an end-to-end feedforward neural network for static detection with a recurrent Long Short-Term Memory (LSTM) network for multi-frame analysis. We present a practical guide with special attention to the selection of the optimizer and batch size. The end-to-end network is able to localize and recognize the vehicles in video from traffic cameras. We show an efficient way to collect relevant in-domain data for training with minimal manual labor. Our results show that the combination with LSTM improves performance for the detection of moving vehicles.
- Book Chapter
1
- 10.1007/978-981-99-1414-2_38
- Jan 1, 2023
Object detection is primary task in computer vision. The various CNN are majorly used by researchers to improve the classification and detection of objects present in video frames. Object detection is a prime task in self-driven cars, satellite images, robotics, etc. The proposed work is focused on improvement of object classification and detection in videos for video analytics. The key focus of work is identification and tuning of hyper-parameters in deep learning models. The deep learning-based object detection models are broadly classified into two categories, i.e., one-stage detector and two-stage detector. We have selected one-stage detector for experimentation. In this paper, a custom CNN model is given with hyper-parameter tuning and the results are compared with state of art models. It is found out that the hyper-parameter tuning on CNN models helps in improvement of object classification and detection accuracy of deep learning models.
- Research Article
- 10.55041/ijsrem41361
- Feb 4, 2025
- INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT
Object detection is a fundamental task in computer vision that involves identifying and localizing objects within images or video frames. This research focuses on implementing and evaluating the YOLOv5 (You Only Look Once version 5) model for real-time object detection. YOLOv5 is known for its efficiency, accuracy, and speed, making it a preferred choice for various applications such as autonomous driving, surveillance, and medical imaging. In this study, we explore the architecture, training process, and performance evaluation of YOLOv5 on benchmark datasets. The results demonstrate that YOLOv5 achieves high precision and recall, outperforming traditional object detection methods in terms of speed and accuracy. Keywords: Object Detection, YOLOv5, Deep Learning, Computer Vision, Real-Time Processing, Neural Networks.
- Conference Article
3
- 10.1109/rait.2018.8389022
- Mar 1, 2018
In the field of video surveillance, one of the existing problems is the detection of object(s) in the video sequence. In this paper, detection of the particular object of interest in the video sequence with more accuracy has been focused. The main objective of this work is to detect the object which belongs to both query image and video sequence. The necessity of this work lies in the field of surveillance, main interest is in monitoring the activity of the pre-specified person or the pre-defined object. For this purpose co-segmentation based, common object detection technique has been implemented to detect the particular object of interest in the video sequence. The main goal of this paper is to establish a system to detect the query or the pre-defined object(s) in the video sequence. The work has been performed with a consideration that all the video frames containing the single object of interest (OI), which is to be detected. The effectiveness of the proposed work has been compared with the existing detection and tracking algorithms in the literature. Utility of the proposed system proves much effective and trustworthy, where lies the hope that the results will stand more accurate in detecting the pre-defined object of interest.
- Book Chapter
1
- 10.1007/978-981-19-5331-6_1
- Nov 8, 2022
Object localization and detection is the emerging area of computer vision and image processing that finds out the location of the object in the video frames or in the digital images. Object localization and detection have many challenges like occlusion, scale variations, intraclass similarities, illumination conditions, pose variations, etc. Multi-view object detection focuses on different views of objects like the top, bottom, side view, etc. We aim to discuss the different Object detection models based on Machine learning and Deep neural network approaches in a multi-view environment. We also discuss different datasets used for object detections and their applications. We apply some object detection models like SSD and Faster R-CNN to specific object categories of Open Image Dataset V6 and compare the results based on mAP (Mean Average Precision).KeywordsObject detectionConvolution neural networksR-CNNFast R-CNNFaster R-CNNEfficientDetCenterNet
- Conference Article
36
- 10.1109/prml52754.2021.9520732
- Jul 16, 2021
The task of object detection is to find all the objects of interest in the image, and to determine their classifications and positions, which is one of the core problems in the field of computer vision. Since the emergence of AlexNet, convolutional neural networks have an absolute position in the field of computer vision, and the research on convolutional neural networks and algorithm structures has become more and more in-depth. Object detection algorithms can be roughly divided into two categories: candidate-based(two stage) and regression-based(one stage). The object detection algorithm based on the candidate area has high accuracy, but the structure is complex and the detection speed is slow. The regression-based object detection algorithm has a simple structure and fast detection speed. It has high application value in the field of real-time object detection, but the detection accuracy is relatively low. With the pursuit of the speed and accuracy of object detection, researchers try to apply mainstream methods in different fields. Therefore, recently Transformers in the NLP field has been used in computer vision, such as ViT, Swin Transformer, etc. It showed transformer-based models perform similar to or better than neural network algorithms, and pointed out new paths for researchers. This paper introduces classic neural networks, discusses the advantages and disadvantages of convolutional neural networks used in object detection algorithms, and introduces the latest innovative methods of Transformer used in computer vision. Finally, the difficulties, challenges and future development of convolutional neural networks and Transformers in object detection are considered.
- Research Article
3
- 10.32362/2500-316x-2023-11-4-26-35
- Aug 1, 2023
- Russian Technological Journal
Objectives. At present, increasing rates of pollution of vast areas by various types of household waste are becoming an increasingly serious problem. In this connection, the creation of a robotic complex capable of performing autonomous litter collection functions becomes an urgent need. One of the key components of such a complex comprises a vision system for detecting and interacting with target objects. The purpose of this work is to develop the underlying algorithmics for the vision system of robots executing area cleaning functions.Methods. Within the framework ofthe proposed structure ofthe system for visual analysis ofthe external environment, algorithms for detecting and classifying objects of various appearance have been developed using convolutional neural networks. The neural network detector was set up by gradient descent on the open dataset of TACO training samples. To determine the geometric parameters of a surface in the field of view of the robot and estimate the coordinates of objects on the ground, a homography matrix was formed to take into account information about the characteristics and location of the video camera.Results. The developed software and algorithms for a mobile robot equipped with a monocular video camera are capable of implementing the functions of neural network detection and classification of litter objects in the frame, as well as projection of found objects on a terrain map for their subsequent collection.Conclusions. Experimental studies have shown that the developed system of visual analysis of the external environment of an autonomous mobile robot has sufficient efficiency to solve the tasks of detecting litter in the field of view of an autonomous mobile robot.
- Conference Article
10
- 10.1109/nnsp.1994.366014
- Sep 6, 1994
A supervised neural network (NN) algorithm was used for automated detection of ischemic episodes resulting from ST segment elevation or depression. The performance of the method was measured using the European ST-T database. In particular the performance was measured in terms of beat-by-beat ischemia detection and in terms of ischemic episodes detection. Aggregate statistics for the description of the detector performance were used due to the small number of events. The algorithm used to train the NN was an adaptive backpropagation (BP) algorithm. This algorithm reduces dramatically training time (10-fold decrease in our case) when compared to the classical BP algorithm. The resulting NN is capable of detecting ischemia independently of the lead used. It was found that the average ischemia episode sensitivity is 88.62% while the average ischemia sensitivity is 72.22%. This drop in ischemia sensitivity could be attributed to the diverse statistical properties of the ECGs within the same patient. The results show that NN can be used in ECG processing in cases where fast and reliable detection of ischemic episodes is desired as in the case of critical care units (CCUs). >