Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

Video Object Segmentation and Tracking

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Object segmentation and object tracking are fundamental research areas in the computer vision community. These two topics are difficult to handle some common challenges, such as occlusion, deformation, motion blur, scale variation, and more. The former contains heterogeneous object, interacting object, edge ambiguity, and shape complexity; the latter suffers from difficulties in handling fast motion, out-of-view, and real-time processing. Combining the two problems of Video Object Segmentation and Tracking (VOST) can overcome their respective difficulties and improve their performance. VOST can be widely applied to many practical applications such as video summarization, high definition video compression, human computer interaction, and autonomous vehicles. This survey aims to provide a comprehensive review of the state-of-the-art VOST methods, classify these methods into different categories, and identify new trends. First, we broadly categorize VOST methods into Video Object Segmentation (VOS) and Segmentation-based Object Tracking (SOT). Each category is further classified into various types based on the segmentation and tracking mechanism. Moreover, we present some representative VOS and SOT methods of each time node. Second, we provide a detailed discussion and overview of the technical characteristics of the different methods. Third, we summarize the characteristics of the related video dataset and provide a variety of evaluation metrics. Finally, we point out a set of interesting future works and draw our own conclusions.

Similar Papers
  • Research Article
  • Cite Count Icon 33
  • 10.1109/tcsvt.2013.2242595
Video Object Segmentation and Tracking Framework With Improved Threshold Decision and Diffusion Distance
  • Jun 1, 2013
  • IEEE Transactions on Circuits and Systems for Video Technology
  • Shao-Yi Chien + 3 more

Video object segmentation and tracking are two essential building blocks of smart surveillance systems. However, there are several issues that need to be resolved. Threshold decision is a difficult problem for video object segmentation with a multi-background model. In addition, some conditions make robust video object tracking difficult. These conditions include nonrigid object motion, target appearance variations due to changes in illumination, and background clutter. In this paper, a video object segmentation and tracking framework is proposed for smart cameras in visual surveillance networks with two major contributions. First, we propose a robust threshold decision algorithm for video object segmentation with a multi-background model. Second, we propose a video object tracking framework based on a particle filter with the likelihood function composed of diffusion distance for measuring color histogram similarity and motion clue from video object segmentation. The proposed framework can track nonrigid moving objects under drastic changes in illumination and background clutter. Experimental results show that the presented algorithms perform well for several challenging sequences, and our proposed methods are effective for the aforementioned issues.

  • Conference Article
  • Cite Count Icon 9
  • 10.1109/icmlc.2008.4620823
A shot boundary detection method for news video based on object segmentation and tracking
  • Jul 1, 2008
  • Xin-Wen Xu + 2 more

As a critical step in many multimedia applications, shot boundary detection has attracted many research interests in recent years. The most of existing methods measure the similarity among video frames based on its low-level feathers. However, they are sensitive to the change in not only brightness, color, motion of object, but also camera motions and the quality of video. This paper proposes an innovative shot boundary detection method for news video based on video object segmentation and tracking. It combines three main techniques: the partitioned histogram comparison method, the video object segmentation and tracking based on wavelet analysis. The partitioned histogram comparison is used as the first filter to effectively reduce the number of video frames which need object segmentation and tracking. The unsupervised video object segmentation and tracking based on wavelet analysis is robust to those problems mentioned above. The efficacy of the proposed method is extensively tested with more than 3 hours of CCTV and CNN news programs, and that 96.4% recall with 97.2% precision have been achieved.

  • Research Article
  • Cite Count Icon 40
  • 10.1109/tcsvt.2004.828347
Robust Segmentation and Tracking of Colored Objects in Video
  • Jun 1, 2004
  • IEEE Transactions on Circuits and Systems for Video Technology
  • T Gevers

Segmenting and tracking of objects in video is of great importance for video-based encoding, surveillance, and retrieval. However, the inherent difficulty of object segmentation and tracking is to distinguish changes in the displacement of objects from disturbing effects such as noise and illumination changes. Therefore, in this paper, we formulate a color-based deformable model which is robust against noisy data and changing illumination. Computational methods are presented to measure color constant gradients. Further, a model is given to estimate the amount of sensor noise through these color constant gradients. The obtained uncertainty is subsequently used as a weighting term in the deformation process. Experiments are conducted on image sequences recorded from three-dimensional scenes. From the experimental results, it is shown that the proposed color constant deformable method successfully finds object contours robust against illumination, and noisy, but homogeneous regions.

  • Conference Article
  • Cite Count Icon 34
  • 10.1109/wacv56688.2023.00172
BURST: A Benchmark for Unifying Object Recognition, Segmentation and Tracking in Video
  • Jan 1, 2023
  • Ali Athar + 6 more

Multiple existing benchmarks involve tracking and segmenting objects in video e.g., Video Object Segmentation (VOS) and Multi-Object Tracking and Segmentation (MOTS), but there is little interaction between them due to the use of disparate benchmark datasets and metrics (e.g. $\mathcal{J}\& {\mathcal{F}}$, mAP, sMOTSA). As a result, published works usually target a particular benchmark, and are not easily comparable to each another. We believe that the development of generalized methods that can tackle multiple tasks requires greater cohesion among these research sub-communities. In this paper, we aim to facilitate this by proposing BURST, a dataset which contains thousands of diverse videos with high-quality object masks, and an associated benchmark with six tasks involving object tracking and segmentation in video. All tasks are evaluated using the same data and comparable metrics, which enables researchers to consider them in unison, and hence, more effectively pool knowledge from different methods across different tasks. Additionally, we demonstrate several baselines for all tasks and show that approaches for one task can be applied to another with a quantifiable and explainable performance difference. Dataset annotations are available at: https://github.com/Ali2500/BURST-benchmark.

  • Research Article
  • Cite Count Icon 22
  • 10.1016/s0923-5965(00)00055-2
2-D mesh-based video object segmentation and tracking with occlusion resolution
  • Jul 11, 2001
  • Signal Processing: Image Communication
  • Işıl Celasun + 3 more

2-D mesh-based video object segmentation and tracking with occlusion resolution

  • Conference Article
  • Cite Count Icon 1
  • 10.1109/icosst48232.2019.9043975
Object Segmentation in Video Sequences by using Single Frame Processing
  • Dec 1, 2019
  • Muhammad Hamza Bhatti + 2 more

Object segmentation, detection and tracking in videos is one of the most important task of computer vision. It is necessary in all of the real time deployed surveillance systems. Various unsupervised and semi-supervised video object segmentation techniques have been implemented and shown efficient results. But all of these techniques process all of the frames of a video sequence, which requires a huge training data and results in a large computational time. In this paper, a semi-supervised technique is proposed which segments an object in a video by just processing a single frame of the sequence. In this framework, a fully convolutional network is used to separate the foreground from the image, create the mask of the object and then segments the object with the help of this mask. The foreground separation in a frame is done by using pre-trained network while, training and testing of rest of the network is done using a specified dataset named as DAVIS. The results show that, the proposed framework takes less computational time and has also improved the overall accuracy of video object segmentation by 10% as compared to previous techniques.

  • Book Chapter
  • Cite Count Icon 1
  • 10.1007/978-981-19-1018-0_57
Analysis of Multifeatured Threshold Filtered-Based Real-Time Video Segmentation and Tracking in Video Surveillance
  • Jan 1, 2022
  • T Kusuma + 1 more

Moving object segmentation and detection have become an important topic in computer perspective. As such, it is widely used in video surveillance such as driving assistance program, robots, traffic monitoring, and crime pattern identification. In addition, video object tracking is an important function in video surveillance systems because it provides temporary interactive information about moving objects. An important function of video object segmentation is to find and separate important elements in the video frame behind the domain. The purpose of video tracking is to combine targeted objects into consecutive video frames. First of all, enhanced threshold filtered video object detection and tracking (TFVODT) is designed to classify objects according to their size, color, and to get better accuracy of video object detection. Initially, the TFVODT framework distinguishes a video object by its characteristics such as size and color. The TFVODT framework performs the function of distinguishing an object through the median filter-based enhanced Laplacian thresholding process. Along with the support of the split object, the TFVODT framework does well to track the video object. Second, threshold filtered video object detection and tracking (ITFVODT) is developed to distinguish video’s elements based on their features such as texture, durability, and performance of video object detection. All video frames found in the ITFVODT framework contain the similar features as quality and contrast.KeywordsObject trackingITFVODTTFVODTEMFVDSegmentation

  • Conference Article
  • Cite Count Icon 7
  • 10.1109/icme.2000.871574
Segmentation and tracking of video objects for a content-based video indexing context
  • Apr 28, 2017
  • M Maziere + 3 more

This paper examines the problem of segmentation and tracking of video objects for content-based information retrieval. Segmentation and tracking of video objects plays an important role in index creation and user request definition steps. The object is initially selected using a semi-automatic approach. For this purpose, a user-based selection is required to define roughly the object to be tracked. In this paper, we propose two different methods to allow an accurate contour definition from the user selection. The first one is based on an active contour model which progressively refines the selection by fitting the natural edges of the object while the second used a binary partition tree with a marker and propagation approach. The video object is thus tracked by using a hybrid structure alternately combining a hierarchical mesh for the motion estimation between two frames and a multi-resolution active contour model. This contour model is derived directly from the mesh boundaries in order to reposition the snake's nodes onto the natural edges of the object. The object-based segmentation associated with object tracking allows relevant descriptors to be built for a future matching purpose.

  • Conference Article
  • Cite Count Icon 5
  • 10.1109/icmlc.2005.1527816
Moving object segmentation and tracking in video
  • Jan 1, 2005
  • Chun-Ming Li + 5 more

Moving object segmentation and tracking in video is an important task not only in computer motion detection and tracking, but also in MPEG-4. A new moving object segmentation and tracking method based on the improved PCA is presented in this paper. Firstly, the improved PCA is used to segment the moving object in the original image sequence. In this step, three frames are enough for the segmentation of rigid and non-rigid moving object from background. Secondly, tracking is performed by shifting the 3 frame window along the image sequence and repeating the first step in each window.

  • Conference Article
  • Cite Count Icon 12
  • 10.1117/12.509859
<title>Performance measures for video object segmentation and tracking</title>
  • Jun 16, 2003
  • Proceedings of SPIE, the International Society for Optical Engineering/Proceedings of SPIE
  • Cigdem E Erdem + 2 more

We propose measures to evaluate the performance of video object segmentation and tracking methods quantitatively without ground-truth segmentation maps. The proposed measures are based on spatial differences of color and motion along the boundary of the estimated video object plane and temporal differences between the color histogram of the current object plane and its neighbors. They can be used to localize (spatially and/or temporally) regions where segmentation results are good or bad; and/or combined to yield a single numerical measure to indicate the goodness of the boundary segmentation and tracking results over a sequence. The validity of the proposed performance measures <i>without ground truth </i>have been demonstrated by canonical correlation analysis of the proposed measures with another set of measures it <i>with ground-truth</i> on a set of sequences (where ground truth information is available). Experimental results are presented to evaluate the segmentation maps obtained from various sequences using different segmentation and tracking algorithms.

  • Research Article
  • Cite Count Icon 148
  • 10.1109/tip.2004.828427
Performance Measures for Video Object Segmentation and Tracking
  • Jul 1, 2004
  • IEEE Transactions on Image Processing
  • C.E Erdem + 2 more

We propose measures to evaluate quantitatively the performance of video object segmentation and tracking methods without ground-truth (GT) segmentation maps. The proposed measures are based on spatial differences of color and motion along the boundary of the estimated video object plane and temporal differences between the color histogram of the current object plane and its predecessors. They can be used to localize (spatially and/or temporally) regions where segmentation results are good or bad; and/or they can be combined to yield a single numerical measure to indicate the goodness of the boundary segmentation and tracking results over a sequence. The validity of the proposed performance measures without GT have been demonstrated by canonical correlation analysis with another set of measures with GT on a set of sequences (where GT information is available). Experimental results are presented to evaluate the segmentation maps obtained from various sequences using different segmentation approaches.

  • Book Chapter
  • 10.1007/978-3-030-34120-6_31
Enhanced Video Segmentation with Object Tracking
  • Jan 1, 2019
  • Zheran Hong + 5 more

The high efficiency and superior performance of fully convolutional network (FCN) architecture makes it a recent trend that employing FCN in video object segmentation task. While these FCN-based methods usually ignore the motion information between frames, which may lead to similar object inference or background clutter issues. To deal with these, we propose to use tracking techniques to improve the performance of video object segmentation. The proposed algorithm performs video object segmentation and tracking simultaneously in a unified framework. After that, the motion information provided by initial tracking result is used to rejecting outliers in the segmentation mask caused by background complexities, such as similar object inference or background clutter issues. In return, the final segmentation result can be used to supervise the tracking result. In this iterative way, the performances of the both tasks are enhanced. Experimental results on the challenging benchmark demonstrate the effectiveness of our proposed method.

  • Research Article
  • Cite Count Icon 2
  • 10.3390/a17080330
Lester: Rotoscope Animation through Video Object Segmentation and Tracking
  • Jul 30, 2024
  • Algorithms
  • Ruben Tous

This article introduces Lester, a novel method to automatically synthesize retro-style 2D animations from videos. The method approaches the challenge mainly as an object segmentation and tracking problem. Video frames are processed with the Segment Anything Model (SAM) and the resulting masks are tracked through subsequent frames with DeAOT, a method of hierarchical propagation for semi-supervised video object segmentation. The geometry of the masks’ contours is simplified with the Douglas–Peucker algorithm. Finally, facial traits, pixelation and a basic rim light effect can be optionally added. The results show that the method exhibits an excellent temporal consistency and can correctly process videos with different poses and appearances, dynamic shots, partial shots and diverse backgrounds. The proposed method provides a more simple and deterministic approach than diffusion models based video-to-video translation pipelines, which suffer from temporal consistency problems and do not cope well with pixelated and schematic outputs. The method is also more feasible than techniques based on 3D human pose estimation, which require custom handcrafted 3D models and are very limited with respect to the type of scenes they can process.

  • Research Article
  • Cite Count Icon 67
  • 10.1016/j.patcog.2015.01.025
Real-time and robust object tracking in video via low-rank coherency analysis in feature space
  • Feb 13, 2015
  • Pattern Recognition
  • Chenglizhao Chen + 3 more

Real-time and robust object tracking in video via low-rank coherency analysis in feature space

  • Research Article
  • Cite Count Icon 1
  • 10.34010/komputika.v12i2.9567
Analisis Metode Kalman Filter, Particle Filter dan Correlation Filter Untuk Pelacakan Objek
  • Sep 8, 2023
  • Komputika : Jurnal Sistem Komputer
  • Ridho Sholehurrohman + 4 more

Object tracking is a challenging in computer vision. Object tracking is divided into two, which can be one object or several objects, depending on the object being observed. The process of tracking an object in the form of one object is to estimate the target in the next sequence based on information from the first frame given. In object tracking in the form of single object tracking, there are five steps that are often used in discriminatory methods, including motion models, feature extraction, observation models, model updates and integration methods. Although various algorithms of object tracking are proposed, there are still failures in the object tracking process caused by occlusion, non-rigid target deformation, and other factors. This study proposes the implementation of the Kalman filter, particle filter, and correlation filter methods for object tracking in video data. The results of the implementation of the three methods can track objects in traffic video data and the script circuit video. In object tracking calculations and method analysis, the kalman filter gets 96.89% where the kalman method is better in terms of accuracy compared to other methods. Meanwhile, in the average performance of computation time, the correlation method gets 26.69 FPS, where the correlation method is superior compared to other competitor methods.&#x0D; Keywords – Kalman Filter; Particle Filter; Correlation Filter; Object Tracking; Object Tracking in Video

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant