An efficient fully unsupervised video object segmentation scheme using an adaptive neural-network classifier architecture
In this paper, an unsupervised video object (VO) segmentation and tracking algorithm is proposed based on an adaptable neural-network architecture. The proposed scheme comprises: 1) a VO tracking module and 2) an initial VO estimation module. Object tracking is handled as a classification problem and implemented through an adaptive network classifier, which provides better results compared to conventional motion-based tracking algorithms. Network adaptation is accomplished through an efficient and cost effective weight updating algorithm, providing a minimum degradation of the previous network knowledge and taking into account the current content conditions. A retraining set is constructed and used for this purpose based on initial VO estimation results. Two different scenarios are investigated. The first concerns extraction of human entities in video conferencing applications, while the second exploits depth information to identify generic VOs in stereoscopic video sequences. Human face/ body detection based on Gaussian distributions is accomplished in the first scenario, while segmentation fusion is obtained using color and depth information in the second scenario. A decision mechanism is also incorporated to detect time instances for weight updating. Experimental results and comparisons indicate the good performance of the proposed scheme even in sequences with complicated content (object bending, occlusion).
- Book Chapter
1
- 10.1007/978-3-030-23943-5_22
- Jan 1, 2019
- Lecture notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
The detection and tracking of object in a video is an important problem in many applications. In surveillance and in robotic vision tracking and recognition of objects and it’s size is desired. In this paper, an algorithm to obtain size of an object in image or video is presented based on pixel relationship to actual size. The object is mainly tracked by the Kalman filter and Log Polar Phase Correlation method is used to more precisely recognize objects in a video. The tracking of objects is performed from frame to frame. As the image of an object gets deformed in a video due to motion of either the camera or the motion of an object a dynamic template for matching is proposed to minimize the error. Simulation results are presented showing the errors in determining the size of objects in an image.
- Research Article
33
- 10.1109/tcsvt.2013.2242595
- Jun 1, 2013
- IEEE Transactions on Circuits and Systems for Video Technology
Video object segmentation and tracking are two essential building blocks of smart surveillance systems. However, there are several issues that need to be resolved. Threshold decision is a difficult problem for video object segmentation with a multi-background model. In addition, some conditions make robust video object tracking difficult. These conditions include nonrigid object motion, target appearance variations due to changes in illumination, and background clutter. In this paper, a video object segmentation and tracking framework is proposed for smart cameras in visual surveillance networks with two major contributions. First, we propose a robust threshold decision algorithm for video object segmentation with a multi-background model. Second, we propose a video object tracking framework based on a particle filter with the likelihood function composed of diffusion distance for measuring color histogram similarity and motion clue from video object segmentation. The proposed framework can track nonrigid moving objects under drastic changes in illumination and background clutter. Experimental results show that the presented algorithms perform well for several challenging sequences, and our proposed methods are effective for the aforementioned issues.
- Conference Article
2
- 10.1109/icdsp.2002.1028155
- Nov 7, 2002
An adaptive neural network architecture is proposed for efficient video object segmentation and tracking of stereoscopic video sequences. The scheme includes (a) a retraining algorithm for adapting network weights to current conditions; (b) a semantically meaningful object extraction module for creating a retraining set; (c) a decision mechanism, which detects the time instances of a new network retraining. The retraining algorithm optimally adapts network weights by exploiting information of the current conditions and simultaneously minimally degrading the obtained network knowledge. The algorithm results in the minimization of a convex function subject to linear constraints, thus, one minimum exists. Furthermore, a decision mechanism is included to detect the time instances that a new network retraining is required. A description of the current conditions is provided by a segmentation fusion algorithm, which appropriately combines color and depth information.
- Book Chapter
3
- 10.1007/3-540-32367-8_11
- Jan 1, 2005
In this chapter, an adaptive neural network architecture is proposed for efficient knowledge extraction in video sequences. The system is focused on video object segmentation and tracking in stereoscopic video sequences. The proposed scheme includes: (a) a retraining algorithm for adapting the network weights to current conditions, (b) a semantically meaningful object extraction module for creating a retraining set and (c) a decision mechanism, which detects the time instances when a new network retraining is activated. The retraining algorithm optimally adapts network weights by exploiting information of the current conditions with a minimal deviation of the network weights. The algorithm results in the minimization of a convex function subject to linear constraints, and thus, one minimum exists. Description of current conditions is provided by a segmentation fusion scheme, which appropriately combines color and depth information. Experimental results on real-life video sequences are presented to indicate the promising performance of the proposed adaptive neural network-based scheme.
- Research Article
- 10.15866/irecos.v8i12.3631
- Dec 31, 2013
- International Review on Computers and Software
In this study, it is proposed that a frame work evaluation of recursive and non recursive algorithms for motion based video object detection and tracking. Object detection and tracking is a challenging task. Video based object detection systems rely on the ability to detect moving objects in video streams. There are many approaches adopted for video based object detection and tracking. Some of the factors should be considered such as stationary and non stationary background, deal with unconstrained environments, various object motion patterns and the dissimilarity in types of object being detected and tracked. This study proposes a recursive and non recursive algorithms such as frame differencing, Mixture of Gaussians are used to detect the object in a motion based video through foreground and background separation. Next, for object tracking is made by Mean-Shift and Lucas Kanade optical flow (KLT) tracking algorithms are used. Based upon the video resolution and frame rate, the detection and tracking timings are calculated for the input video dataset. We observed that based on their evaluation to obtain correct detection and tracking, Recursive detection algorithm and Mean shift tracking is used to track the detected objects in motion based video.
- Conference Article
5
- 10.1109/avss.2011.6027366
- Aug 1, 2011
Video segmentation and tracking have been important and challenging issues for many video processing. A novel spatio-temporal video object segmentation and tracking algorithm is proposed in this paper. This algorithm is based on multi-agent system and active contour technique. The multi-agent system is composed of a set of supervisor and explorator agents. The agents are communicating and inspired in their conduct from active contour technique, more precisely the “Level Sets”. We used the DIMA platform to implement this algorithm. Experimental results indicate that the proposed algorithm is more robust than previous approaches.
- Research Article
4
- 10.1007/s11265-014-0921-0
- Jul 9, 2014
- Journal of Signal Processing Systems
In this paper two efficient unsupervised video object segmentation approaches are proposed and thoroughly compared. Both methods are based on the exploitation of depth information, estimated from stereoscopic pairs. Depth is a more efficient semantic descriptor of visual content, since usually an object is located on one depth plane. However, depth information fails to accurately represent the contours of an object mainly due to erroneous disparity estimation and occlusion issues. For this reason, the first approach projects color segments onto depth information in order to address the limitations of both depth and color segmentation; color segmentation usually over-partitions an object into several regions, while depth fails to precisely represent object contours. Depth information is produced through an occlusion compensated disparity field and then a depth map is generated. On the contrary, color segmentation is accomplished by incorporating a modified version of the Multiresolution Recursive Shortest Spanning Tree segmentation algorithm (M-RSST). Next considering the first "Constrained Fusion of Color Segments" (CFCS) approach, a color segments map is created, by applying the M-RSST to one of the stereoscopic channels. In this case video objects are extracted by fusing color segments according to depth similarity criteria. The second method also utilizes the depth segments map. In particular an active contour is automatically initialized onto the boundary of each depth segment, which is usually different from a video object's boundary. Initialization is accomplished by a fitness function that considers different color areas and preserves the shapes of depth segments' boundaries. For acceleration purposes each point of the active contour is associated to an "attractive edge" point and a greedy approach is incorporated so that the active contour converges to its final position. Several experiments on real life stereoscopic sequences are performed and extensive comparisons in terms of speed and accuracy indicate the promising performance of both methods.
- Research Article
84
- 10.1109/76.844996
- Jun 1, 2000
- IEEE Transactions on Circuits and Systems for Video Technology
An efficient technique for summarization of stereoscopic video sequences is presented, which extracts a small but meaningful set of video frames using a content-based sampling algorithm. The proposed video-content representation provides the capability of browsing digital stereoscopic video sequences and performing more efficient content-based queries and indexing. Each stereoscopic video sequence is first partitioned into shots by applying a shot-cut detection algorithm so that frames (or stereo pairs) of similar visual characteristics are gathered together. Each shot is then analyzed using stereo-imaging techniques, and the disparity field, occluded areas, and depth map are estimated. A multiresolution implementation of the recursive shortest spanning tree (RSST) algorithm is applied for color and depth segmentation, while fusion of color and depth segments is employed for reliable video object extraction. In particular, color segments are projected onto depth segments so that video objects on the same depth plane are retained, while at the same time accurate object boundaries are extracted. Feature vectors are then constructed using multidimensional fuzzy classification of segment features including size, location, color, and depth. Shot selection is accomplished by clustering similar shots based on the generalized Lloyd-Max algorithm, while for a given shot, key frames are extracted using an optimization method for locating frames of minimally correlated feature vectors. For efficient implementation of the latter method, a genetic algorithm is used. Experimental results are presented, which indicate the reliable performance of the proposed scheme on real-life stereoscopic video sequences.
- Research Article
40
- 10.1109/tcsvt.2004.828347
- Jun 1, 2004
- IEEE Transactions on Circuits and Systems for Video Technology
Segmenting and tracking of objects in video is of great importance for video-based encoding, surveillance, and retrieval. However, the inherent difficulty of object segmentation and tracking is to distinguish changes in the displacement of objects from disturbing effects such as noise and illumination changes. Therefore, in this paper, we formulate a color-based deformable model which is robust against noisy data and changing illumination. Computational methods are presented to measure color constant gradients. Further, a model is given to estimate the amount of sensor noise through these color constant gradients. The obtained uncertainty is subsequently used as a weighting term in the deformation process. Experiments are conducted on image sequences recorded from three-dimensional scenes. From the experimental results, it is shown that the proposed color constant deformable method successfully finds object contours robust against illumination, and noisy, but homogeneous regions.
- Conference Article
20
- 10.1109/icecds.2017.8389917
- Aug 1, 2017
Tracking people or moving objects across a PTZ camera and maintaining a track within a camera is a challenging task in applications of video surveillance. The goal of object tracking is segmenting a region of interest from a video scene and keeping track of its motion, positioning and occlusion. The object detection and object classification are preceding steps for tracking an object in sequence of images. Object detection is performed to check existence of objects in video and to precisely locate that object. Then detected object can be classified in various categories such as humans, vehicles, birds, floating clouds, swaying tree and other moving objects. Object tracking is performed using monitoring objects' spatial and temporal changes during a video sequence, including its presence, position, size, shape, etc. Object tracking is used in several applications such as video surveillance, robot vision, traffic monitoring, Video in painting and Animation. This paper presents a study on moving object detection and tracking techniques using PTZ Camera.
- Conference Article
4
- 10.1109/iconstem.2017.8261424
- Mar 1, 2017
Visual surveillance System is basically used for analysis and explanation of object behaviors. It consists of static and moving object detection, video tracking to understand the events that occur in scene. The most important objective of this paper is to determine the various methods in static and moving object detection as well as tracking of moving objects. There are various classes of detected object such as tree, clouds, person and other moving objects. Detection for moving object is a very challenging for any video surveillance system. Object Tracking is used to find the area where objects are available and shape of objects in each frame in higher level application. A new proposed approach is provided for efficient object tracking using Kernel and feature based tracking methods. It is process is a Vehicle classification performance can be done in surveillance videos with the help of this method. This method requires shape and appearance of the object. Object basically contains various features and any of them is used to track object as kernel. Object tracking can be done easily if we compute the motion of the kernel of the between more than two frames. Hence dividing it into two processes are training and testing of objects in videos. First process is a trained image or frame in videos and trained object value based on shape and moving position with vehicle positive and negative results. It's store one database for testing video surveillance object values. Second process is extracted image in video after capture object value then tested in database object value, if object values are matched because result is positive then object tracked in given surveillance videos. Object matching processing use to template matching technique.
- Research Article
6
- 10.15623/ijret.2016.0502055
- Feb 25, 2016
- International Journal of Research in Engineering and Technology
Detection and tracking of moving objects are an important research area in a video surveillance application. Object tracking is used in several applications such as video compression, surveillance, robot technology and so on. Recently many researches has been developed for video object detection, however the object detection accuracy and background object detection in the video frames are still poses demanding issues. In this paper, a novel framework called Threshold Filtered Video Object Detection and Tracking (TFVODT) is designed for effective detection and tracking of moving objects. TFVODT framework initially takes video file as input, and then video frames are segmented using Median Filter-based Enhanced Laplacian Thresholding for improving the video quality by reducing mean square error. Next, Color Histogram-based Particle Filter is applied to the segmented objects in TFVODT framework for video object tracking. The Color Histogram-based Particle Filter measures the likelihood function, particle posterior and particle prior function based on the Bayes Sequential Estimation model for improving the object tracking accuracy. Finally, the objects detection is performed with help of Improvisation of Enhanced Laplacian Threshold (IELT) to enhance video object detection accuracy and to recognize background moving object detection. The proposed TFVODT framework using video images obtained from Internet Archive 501(c) (3) for conducting experiment and comparison is made with the existing object detection techniques. Experimental evaluation of TFVODT framework is done with the performance metrics such as object segmentation accuracy, Peak Signal to Noise Ratio, object tracking accuracy, Mean Square Error and object detection accuracy of moving video object frames. Experimental analysis shows that the TFVODT framework is able to improve the video object detection accuracy by 18% and reduces the Peak Signal to Noise Ratio by 23 % when compared to the state-ofthe-art works.
- Research Article
133
- 10.1109/tpami.2020.2966453
- Jan 13, 2020
- IEEE Transactions on Pattern Analysis and Machine Intelligence
This paper conducts a systematic study on the role of visual attention in video object pattern understanding. By elaborately annotating three popular video segmentation datasets (DAVIS 16, Youtube-Objects, and SegTrack V2) with dynamic eye-tracking data in the unsupervised video object segmentation (UVOS) setting. For the first time, we quantitatively verified the high consistency of visual attention behavior among human observers, and found strong correlation between human attention and explicit primary object judgments during dynamic, task-driven viewing. Such novel observations provide an in-depth insight of the underlying rationale behind video object pattens. Inspired by these findings, we decouple UVOS into two sub-tasks: UVOS-driven Dynamic Visual Attention Prediction (DVAP) in spatiotemporal domain, and Attention-Guided Object Segmentation (AGOS) in spatial domain. Our UVOS solution enjoys three major advantages: 1) modular training without using expensive video segmentation annotations, instead, using more affordable dynamic fixation data to train the initial video attention module and using existing fixation-segmentation paired static/image data to train the subsequent segmentation module; 2) comprehensive foreground understanding through multi-source learning; and 3) additional interpretability from the biologically-inspired and assessable attention. Experiments on four popular benchmarks show that, even without using expensive video object mask annotations, our model achieves compelling performance compared with state-of-the-arts and enjoys fast processing speed (10 fps on a single GPU). Our collected eye-tracking data and algorithm implementations have been made publicly available at https://github.com/wenguanwang/AGS.
- Research Article
- 10.1007/s11042-026-21444-x
- Feb 26, 2026
- Multimedia Tools and Applications
An improved semi-supervised video object segmentation and tracking algorithm for real-time applications
- Conference Article
9
- 10.1109/icmlc.2008.4620823
- Jul 1, 2008
As a critical step in many multimedia applications, shot boundary detection has attracted many research interests in recent years. The most of existing methods measure the similarity among video frames based on its low-level feathers. However, they are sensitive to the change in not only brightness, color, motion of object, but also camera motions and the quality of video. This paper proposes an innovative shot boundary detection method for news video based on video object segmentation and tracking. It combines three main techniques: the partitioned histogram comparison method, the video object segmentation and tracking based on wavelet analysis. The partitioned histogram comparison is used as the first filter to effectively reduce the number of video frames which need object segmentation and tracking. The unsupervised video object segmentation and tracking based on wavelet analysis is robust to those problems mentioned above. The efficacy of the proposed method is extensively tested with more than 3 hours of CCTV and CNN news programs, and that 96.4% recall with 97.2% precision have been achieved.