Motion-based segmentation and contour-based classification of video objects
The segmentation of objects in video sequences constitutes a prerequisite for numerous applications ranging from computer vision tasks to second-generation video coding.We propose an approach for segmenting video objects based on motion cues. To estimate motion we employ the 3D structure tensor, an operator that provides reliable results by integrating information from a number of consecutive video frames. We present a new hierarchical algorithm, embedding the structure tensor into a multiresolution framework to allow the estimation of large velocities.The motion estimates are included as an external force into a geodesic active contour model, thus stopping the evolving curve at the moving object's boundary. A level set-based implementation allows the simultaneous segmentation of several objects.As an application based on our object segmentation approach we provide a video object classification system. Curvature features of the object contour are matched by means of a curvature scale space technique to a database containing preprocessed views of prototypical objects.We provide encouraging experimental results calculated on synthetic and real-world video sequences to demonstrate the performance of our algorithms.
- Conference Article
2
- 10.1117/12.502558
- Jun 16, 2003
- Proceedings of SPIE, the International Society for Optical Engineering/Proceedings of SPIE
<title>Unsupervised motion-based object segmentation refined by color</title>
- Research Article
6
- 10.1016/j.image.2020.115858
- Apr 20, 2020
- Signal Processing: Image Communication
Video object tracking and segmentation with box annotation
- Book Chapter
1
- 10.1007/978-981-10-3002-4_26
- Jan 1, 2016
In this paper, we propose a new method to detect and segment foreground object in video automatically. Given a video sequence, our method begins by generating proposal bounding boxes in each frame, according to both static and motion cues. The boxes are used to detect the primary object in the sequence. We measure each box with its likelihood of containing a foreground object, connect boxes in adjacent frames and calculate the similarity between them. A layered Directed Acyclic Graph is constructed to select object box in each frame. With the help of the object boxes, we model the motion and appearance of the object. Motion cues and appearance cues are combined into an energy minimization framework to obtain the coherent foreground object segmentation in the whole video. Our method reports comparable results with state-of-the-art works on challenging benchmark dataset.
- Research Article
7
- 10.7763/ijcte.2010.v2.248
- Jan 1, 2010
- International Journal of Computer Theory and Engineering
In modern times, video object segmentation has emerged as one of the most imperative and challenging area of research. The principal objective of video object segmentation is to facilitate content-based representation by extracting objects of interest from a series of consecutive video frames. Recently, a number of video object segmentation algorithms have been discussed and unfortunately most existing segmentation algorithms are not adequate and robust enough to process noisy video sequences. Competence of most segmentation techniques is affected by the presence of noise in frames which is a critical issue of edge preservation. This paper presents a novel video object segmentation approach for noisy color video sequences towards effective video retrieval. Initially, the noisy video frames are denoised using a strategy based on an enhanced sparse representation in transform domain. Afterwards, the background is estimated from the denoised frames using the Expectation Maximization (EM) algorithm. Then, the foreground objects i.e.) moving video objects are segmented with the aid of the novel approach presented. The biorthogonal wavelet transform and the L2 norm distance measure are employed in the foreground object segmentation. The experimental results demonstrate the effectiveness of the presented approach in segmenting the video objects from noisy color video sequences.
- Conference Article
3
- 10.1145/2733373.2806324
- Oct 13, 2015
This paper investigates how to exploit eye gaze data for understanding visual content. In particular, we propose a human-in-the-loop approach for object segmentation in videos, where humans provide significant cues on spatiotemporal relations between object parts (i.e. superpixels in our approach) by simply looking at video sequences. Such constraints, together with object appearance properties, are encoded into an energy function so as to tackle the segmentation problem as a labeling one. The proposed method uses gaze data from only two people and was tested on two challenging visual benchmarks: 1) SegTrack v2 and 2) FBMS-59. The achieved performance showed how our method outperformed more complex video object segmentation approaches, while reducing the effort needed for collecting human feedback
- Conference Article
2
- 10.1109/mmsp.2005.248622
- Oct 1, 2005
We discuss a new video analysis approach for coherent key-frame extraction and object segmentation. As two basic units for content-based video analysis, key-frame extraction and object segmentation are usually implemented independently and separately based on different feature sets. Our previous work showed that by exploiting the inherent relationship between key-frames and objects, a set of salient key-frames can be extracted to support robust and efficient object segmentation. This work furthers the previous numerical studies by suggesting a new analytical approach to jointly formulate key-frame extraction and object segmentation via a statistical mixture model where the concept of frame/pixel saliency is introduced. A modified expectation maximization algorithm is developed for model estimation that leads to the most salient key-frames for object segmentation. Simulations on both synthetic and real videos show the effectiveness and efficiency of the proposed method
- Conference Article
4
- 10.1109/icip.2000.899356
- Jan 1, 2000
This paper examines the problem of segmentation and tracking of video objects for a content-based information retrieval context. Our method starts first with an interactive video object selection, then alternately tracks and fits the object of interest as long as possible. A user-based selection is required in order to initialize the process, whereas an active contour model progressively refines the selection by fitting the natural edges of the object. The video object is thus tracked by using a hybrid structure combining a hierarchical mesh for the motion estimation between two frames and a multi-resolution active contour model. This contour model is derived directly from the mesh boundaries in order to reposition the snake's nodes onto the natural edges of the object.
- Conference Article
1
- 10.1109/icosst48232.2019.9043975
- Dec 1, 2019
Object segmentation, detection and tracking in videos is one of the most important task of computer vision. It is necessary in all of the real time deployed surveillance systems. Various unsupervised and semi-supervised video object segmentation techniques have been implemented and shown efficient results. But all of these techniques process all of the frames of a video sequence, which requires a huge training data and results in a large computational time. In this paper, a semi-supervised technique is proposed which segments an object in a video by just processing a single frame of the sequence. In this framework, a fully convolutional network is used to separate the foreground from the image, create the mask of the object and then segments the object with the help of this mask. The foreground separation in a frame is done by using pre-trained network while, training and testing of rest of the network is done using a specified dataset named as DAVIS. The results show that, the proposed framework takes less computational time and has also improved the overall accuracy of video object segmentation by 10% as compared to previous techniques.
- Conference Article
- 10.1145/3293353.3293381
- Dec 18, 2018
Video object segmentation aims to segment objects in a video sequence, given some user annotation which indicates the object of interest. Although Convolutional Neural Networks (CNNs) have been used in the recent past for the purpose of foreground segmentation in videos, adversarial training methods have not been used effectively to solve this problem, in spite of its extensive use for solving many other problems in Computer Vision. Earlier, flow features and motion trajectories have been extensively used to capture the temporal consistency between subsequent frames to segment moving objects in videos. However, we show that our proposed framework of processing the video frames independently using a deep generative adversarial network (GAN), is able to maintain the temporal coherency across frames without the use of any explicit trajectory based information, to provide superior results. Our main contribution lies in introducing a GAN based framework along with the incorporation of an Intersection-over-Union score based novel cost function for training the model, to solve the problem of foreground object segmentation in videos. The proposed method, when evaluated on popular real-world video segmentation datasets viz. DAVIS, SegTrack-v2 and YouTube-Objects, exhibits substantial performance gain over the recent state-of-the-art methods.
- Conference Article
- 10.2991/emeit.2012.455
- Jan 1, 2012
A novel video object segmentation method combines change detection and edge detection is proposed here. The process of the algorithm can be divided into three parts: motion detection, spatial segmentation and temporal-spatial filter, which integrates the spatial and temporal information of the video sequence. The motion detection step makes use of the t -distribution significance test; then, the initial motion detection mask of every frame in the video sequence can be integrated to form the movement mask. For spatial segmentation the Sobel edge detection operator is used to get the boundary of the video objects in current frame; temporal-spatial filter then integrates the temporal and spatial information, extracts the precise boundary of moving object and also reduces the residual noise. Finally, the segmentation of video objects can be optimized by filling and morphology operation.
- Conference Article
- 10.1109/eurcon.2005.1630113
- Jan 1, 2005
It remains a challenge to automatically segment video objects consistent with human visual perception and content understanding. Based on extensive literature survey on existing video object segmentation algorithms, we propose in this paper an integrated approach for semantic video object segmentation, where shape cues in successive frames is taken into account in exploiting the principle of change detection together with pre- and post processing to improve its segmentation accuracy. In comparison with existing segmentation techniques, our contribution can be highlighted as: (i) pre-processing via image sharpness to present a strong edge detection and improve change detections; (ii) localizing the positional range of the moving object to reduce the computation followed by the step of getting the difference of these two frames in that position range to extract the moving object mask. Benchmarked by one of the well reported existing algorithms, we illustrate that our proposed algorithm achieves certain level of improvements, and our algorithm also features in automatic semantic object segmentation inside videos
- Research Article
20
- 10.1109/tip.2018.2859622
- Jul 30, 2018
- IEEE Transactions on Image Processing
It is a challenging task to extract segmentation mask of a target from a single noisy video, which involves object discovery coupled with segmentation. To solve this challenge, we present a method to jointly discover and segment an object from a noisy video, where the target disappears intermittently throughout the video. Previous methods either only fulfill video object discovery, or video object segmentation presuming the existence of the object in each frame. We argue that jointly conducting the two tasks in a unified way will be beneficial. In other words, video object discovery and video object segmentation tasks can facilitate each other. To validate this hypothesis, we propose a principled probabilistic model, where two dynamic Markov networks are coupled-one for discovery and the other for segmentation. When conducting the Bayesian inference on this model using belief propagation, the bi-directional message passing reveals a clear collaboration between these two inference tasks. We validated our proposed method in five data sets. The first three video data sets, i.e., the SegTrack data set, the YouTube-objects data set, and the Davis data set, are not noisy, where all video frames contain the objects. The two noisy data sets, i.e., the XJTU-Stevens data set, and the Noisy-ViDiSeg data set, newly introduced in this paper, both have many frames that do not contain the objects. When compared with state of the art, it is shown that although our method produces inferior results on video data sets without noisy frames, we are able to obtain better results on video data sets with noisy frames.
- Conference Article
7
- 10.1109/icme.2000.871574
- Apr 28, 2017
This paper examines the problem of segmentation and tracking of video objects for content-based information retrieval. Segmentation and tracking of video objects plays an important role in index creation and user request definition steps. The object is initially selected using a semi-automatic approach. For this purpose, a user-based selection is required to define roughly the object to be tracked. In this paper, we propose two different methods to allow an accurate contour definition from the user selection. The first one is based on an active contour model which progressively refines the selection by fitting the natural edges of the object while the second used a binary partition tree with a marker and propagation approach. The video object is thus tracked by using a hybrid structure alternately combining a hierarchical mesh for the motion estimation between two frames and a multi-resolution active contour model. This contour model is derived directly from the mesh boundaries in order to reposition the snake's nodes onto the natural edges of the object. The object-based segmentation associated with object tracking allows relevant descriptors to be built for a future matching purpose.
- Research Article
1
- 10.1049/iet-cvi.2018.5376
- Aug 30, 2018
- IET Computer Vision
Tracking of moving objects in video sequences is an important research problem because of its many industrial, biomedical, and security applications. Significant progress has been made on this topic in the last few decades. However, the ability to track objects accurately in video sequences that have challenging conditions and unexpected events, e.g. background motion and shadows; objects with different sizes and contrasts; a sudden change in illumination; partial object camouflage; and low signal‐to‐noise ratio, remains an important research problem. To address such difficulties, the authors developed a robust multiscale visual tracker that represents a captured video frame as different subbands in the wavelet domain. It then applies N independent particle filters to a small subset of these subbands, where the choice of this subset of wavelet subbands changes with each captured frame. Finally, it fuses the outputs of these N independent particle filters to obtain final position tracks of multiple moving objects in the video sequence. To demonstrate the robustness of their multiscale visual tracker, they applied it to four example videos that exhibit different challenges. Compared to a standard full‐resolution particle filter‐based tracker and a single wavelet subband (LL) 2 ‐based tracker, their multiscale tracker demonstrates significantly better tracking performance.
- Conference Article
34
- 10.1109/icip.2001.958427
- Oct 7, 2001
We propose an approach to the segmentation of video objects based on motion cues. Motion analysis is performed by estimating local orientations in the spatiotemporal domain using the three-dimensional structure tensor. These estimates are integrated as an external force into an active contour model, thus stopping the evolving curve when it reaches the moving object's boundary. To enable simultaneous detection of several objects, we reformulate the tensor-based active contour model using the level-set technique. In addition, a contour refinement technique has been developed to better approximate the real boundary of the moving object. We provide promising experimental results calculated on real-world video sequences widely used within the computer vision community.