A Novel Video Object Segmentation Approach for Noisy Video Sequences towards Effective Video Retrieval

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

In modern times, video object segmentation has emerged as one of the most imperative and challenging area of research. The principal objective of video object segmentation is to facilitate content-based representation by extracting objects of interest from a series of consecutive video frames. Recently, a number of video object segmentation algorithms have been discussed and unfortunately most existing segmentation algorithms are not adequate and robust enough to process noisy video sequences. Competence of most segmentation techniques is affected by the presence of noise in frames which is a critical issue of edge preservation. This paper presents a novel video object segmentation approach for noisy color video sequences towards effective video retrieval. Initially, the noisy video frames are denoised using a strategy based on an enhanced sparse representation in transform domain. Afterwards, the background is estimated from the denoised frames using the Expectation Maximization (EM) algorithm. Then, the foreground objects i.e.) moving video objects are segmented with the aid of the novel approach presented. The biorthogonal wavelet transform and the L2 norm distance measure are employed in the foreground object segmentation. The experimental results demonstrate the effectiveness of the presented approach in segmenting the video objects from noisy color video sequences.

Similar Papers
  • Research Article
  • Cite Count Icon 20
  • 10.1109/tip.2018.2859622
Joint Video Object Discovery and Segmentation by Coupled Dynamic Markov Networks.
  • Jul 30, 2018
  • IEEE Transactions on Image Processing
  • Ziyi Liu + 6 more

It is a challenging task to extract segmentation mask of a target from a single noisy video, which involves object discovery coupled with segmentation. To solve this challenge, we present a method to jointly discover and segment an object from a noisy video, where the target disappears intermittently throughout the video. Previous methods either only fulfill video object discovery, or video object segmentation presuming the existence of the object in each frame. We argue that jointly conducting the two tasks in a unified way will be beneficial. In other words, video object discovery and video object segmentation tasks can facilitate each other. To validate this hypothesis, we propose a principled probabilistic model, where two dynamic Markov networks are coupled-one for discovery and the other for segmentation. When conducting the Bayesian inference on this model using belief propagation, the bi-directional message passing reveals a clear collaboration between these two inference tasks. We validated our proposed method in five data sets. The first three video data sets, i.e., the SegTrack data set, the YouTube-objects data set, and the Davis data set, are not noisy, where all video frames contain the objects. The two noisy data sets, i.e., the XJTU-Stevens data set, and the Noisy-ViDiSeg data set, newly introduced in this paper, both have many frames that do not contain the objects. When compared with state of the art, it is shown that although our method produces inferior results on video data sets without noisy frames, we are able to obtain better results on video data sets with noisy frames.

  • Book Chapter
  • Cite Count Icon 1
  • 10.1007/978-981-10-3002-4_26
Video Object Detection and Segmentation Based on Proposal Boxes
  • Jan 1, 2016
  • Xiaodi Zhang + 3 more

In this paper, we propose a new method to detect and segment foreground object in video automatically. Given a video sequence, our method begins by generating proposal bounding boxes in each frame, according to both static and motion cues. The boxes are used to detect the primary object in the sequence. We measure each box with its likelihood of containing a foreground object, connect boxes in adjacent frames and calculate the similarity between them. A layered Directed Acyclic Graph is constructed to select object box in each frame. With the help of the object boxes, we model the motion and appearance of the object. Motion cues and appearance cues are combined into an energy minimization framework to obtain the coherent foreground object segmentation in the whole video. Our method reports comparable results with state-of-the-art works on challenging benchmark dataset.

  • Research Article
  • Cite Count Icon 2
  • 10.1117/1.jei.22.2.023005
Automatic video matting based on hybrid video object segmentation and closed-form matting
  • Apr 26, 2013
  • Journal of Electronic Imaging
  • Wu-Chih Hu + 2 more

This paper proposes an automatic video matting method that uses hybrid video object segmentation and closed-form matting. Hybrid video object segmentation, based on background construction-based video object segmentation and foreground extraction-based video object segmentation, is used to obtain reliable foreground objects. Next, the trimaps are automatically generated using two different scanning windows from the hybrid video object segmentation results. Finally, closed-form matting is used, with the automatically generated trimaps, to yield the alpha mattes and the foreground objects with an opacity estimate. Experimental results show that the proposed method outperforms state-of-the-art automatic video matting methods based on closed-form matting.

  • Book Chapter
  • Cite Count Icon 3
  • 10.4018/978-1-59904-845-1.ch106
Video Object Segmentation
  • Jan 1, 2009
  • Ee Ping Ong + 1 more

Video object segmentation aims to extract different video objects from a video (i.e., a sequence of consecutive images). It has attracted vast interests and substantial research effort for the past decade because it is a prerequisite for visual content retrieval (e.g., MPEG-7 related schemes), object-based compression and coding (e.g., MPEG-4 codecs), object recognition, object tracking, security video surveillance, traffic monitoring for law enforcement, and many other applications. Video object segmentation is a nonstandardized but indispensable component for an MPEG4/7 scheme in order to successfully develop a complete solution. In fact, in order to utilize MPEG-4 object-based video coding, video object segmentation must first be carried out to extract the required video object masks. Video object segmentation is an even more important issue in military applications such as real-time remote missile/vehicle/soldier’s identification and tracking. Other possible applications include home/office/warehouse security where monitoring and recording of intruders/foreign objects, alarming the personnel concerned or/and transmitting the segmented foreground objects via a bandwidth-hungry channel during the appearance of intruders are of particular interest. Thus, it can be seen that fully automatic video object segmentation tool is a very useful tool that has very wide practical applications in our everyday life where it can contribute to improved efficiency, time, manpower, and cost savings.

  • Research Article
  • Cite Count Icon 6
  • 10.1016/j.image.2020.115858
Video object tracking and segmentation with box annotation
  • Apr 20, 2020
  • Signal Processing: Image Communication
  • Ye Wang + 6 more

Video object tracking and segmentation with box annotation

  • Conference Article
  • 10.1145/3293353.3293381
VidSeg-GAN
  • Dec 18, 2018
  • Saptakatha Adak + 1 more

Video object segmentation aims to segment objects in a video sequence, given some user annotation which indicates the object of interest. Although Convolutional Neural Networks (CNNs) have been used in the recent past for the purpose of foreground segmentation in videos, adversarial training methods have not been used effectively to solve this problem, in spite of its extensive use for solving many other problems in Computer Vision. Earlier, flow features and motion trajectories have been extensively used to capture the temporal consistency between subsequent frames to segment moving objects in videos. However, we show that our proposed framework of processing the video frames independently using a deep generative adversarial network (GAN), is able to maintain the temporal coherency across frames without the use of any explicit trajectory based information, to provide superior results. Our main contribution lies in introducing a GAN based framework along with the incorporation of an Intersection-over-Union score based novel cost function for training the model, to solve the problem of foreground object segmentation in videos. The proposed method, when evaluated on popular real-world video segmentation datasets viz. DAVIS, SegTrack-v2 and YouTube-Objects, exhibits substantial performance gain over the recent state-of-the-art methods.

  • Conference Article
  • Cite Count Icon 9
  • 10.1109/icip.2003.1246921
Automatic video object segmentation via 3D structure tensor
  • Nov 24, 2003
  • H.-Y Wang + 1 more

3D structure tensor is an effective representation of the local motion information of video object (VO) and has been exploited for performing VO segmentation. However, existing 3D structure tensor-based VO segmentation approaches often yield inaccurate objects' boundaries, and high computation is needed for estimating dense motion field. To address these concerns, a new scheme is proposed in this paper by generating the spatial-constrained motion masks without computing dense motion field. For that, scale-adaptive spatio-temporal filtering steered by the condition number is developed to handle multiple motions contributed from different VOs. As rigid, and nonrigid VO motions need to be handled differently on mask generation, rigidity analysis is conducted based on standard deviation of correlation coefficients over a range of successive video frames in order to identify whether each video sequence frame contains rigid or nonrigid motion. Various masks, such as eigenmaps, coherency-measurement maps, and change-detection maps, are produced and fused for generating the final VO motion masks. With boundary refinement by graph-based spatial segmentation, experimental results present accurately segmented moving VOs using different kinds of test sequences.

  • Book Chapter
  • Cite Count Icon 4
  • 10.1007/978-3-030-42128-1_5
Unsupervised Learning of Object Segmentation in Video with Highly Probable Positive Features
  • Jan 1, 2020
  • Marius Leordeanu

Many times when learning without human supervision, it is possible to tell whether a certain cue or data sample is likely to belong to the positive class of interest. In this chapter, we study this case and show that such highly probably positive features could be reliably used for learning in the real natural world, without human supervision. We chose as use case the problem of foreground object segmentation, since it is one of the fundamental ones in vision. The main task, in this case, is to separate automatically the main object of interest present in a video sequence from its surrounding background. An efficient solution to this task would have an immense practical value. It would enable large-scale video interpretation at a high semantic level in the absence of the costly manual labeling. In this chapter, we present several unsupervised algorithms for generating foreground object soft masks based on automatic selection and learning from highly probable positive features. We start with a very simple and fast, yet surprisingly effective method that is able to produce robust object segmentations by using only simple colors as features. While being very simple to implement and understand, the algorithm constitutes the basis for a more general principle for learning from highly probable positive features, which we study theoretically and develop further within a more complex method for unsupervised video object segmentation. One important module in this algorithm connects to the feature selection by clustering method presented in Chap. 4—that approach is used in this case for learning an effective and robust patch-based descriptor based on color co-occurrences. We also introduce a novel and fast algorithm for background subtraction, called VideoPCA, based on modeling the background scene with a linear subspace and regarding the main foreground objects as regions that do not belong to that subspace. All algorithms and ideas presented are, at the core, connected by a single fundamental idea—that of learning from highly probable positive features, which are easy to detect in an unsupervised way with high precision and are effective, together, in learning powerful classifiers. The idea naturally starts and evolves from the insights and conclusions of the previous chapters presented in the book. In this chapter, we show that such HPP features can be selected efficiently by taking into consideration the spatiotemporal appearance and motion consistency of the object in the video sequence. We also emphasize the role of the contrasting properties between the foreground object and its background. Our final foreground segmentation model is created over several stages: we start from pixel-level analysis and move to descriptors that consider information over groups of pixels combined with efficient motion analysis. We also prove theoretical properties of our unsupervised learning method, which under some mild constraints is guaranteed to learn the correct classifier even in the unsupervised case. We achieve competitive and even state-of-the-art results on the challenging YouTube-Objects and SegTrack datasets, while being at least one order of magnitude faster than the competition. The strong performance of our method, along with its theoretical properties, constitutes another step towards solving unsupervised discovery in video.

  • Research Article
  • Cite Count Icon 33
  • 10.1109/tcsvt.2013.2242595
Video Object Segmentation and Tracking Framework With Improved Threshold Decision and Diffusion Distance
  • Jun 1, 2013
  • IEEE Transactions on Circuits and Systems for Video Technology
  • Shao-Yi Chien + 3 more

Video object segmentation and tracking are two essential building blocks of smart surveillance systems. However, there are several issues that need to be resolved. Threshold decision is a difficult problem for video object segmentation with a multi-background model. In addition, some conditions make robust video object tracking difficult. These conditions include nonrigid object motion, target appearance variations due to changes in illumination, and background clutter. In this paper, a video object segmentation and tracking framework is proposed for smart cameras in visual surveillance networks with two major contributions. First, we propose a robust threshold decision algorithm for video object segmentation with a multi-background model. Second, we propose a video object tracking framework based on a particle filter with the likelihood function composed of diffusion distance for measuring color histogram similarity and motion clue from video object segmentation. The proposed framework can track nonrigid moving objects under drastic changes in illumination and background clutter. Experimental results show that the presented algorithms perform well for several challenging sequences, and our proposed methods are effective for the aforementioned issues.

  • Research Article
  • Cite Count Icon 12
  • 10.1016/j.jvcir.2011.10.008
Video object segmentation in rainy situations based on difference scheme with object structure and color analysis
  • Nov 4, 2011
  • Journal of Visual Communication and Image Representation
  • Wu-Chih Hu + 3 more

Video object segmentation in rainy situations based on difference scheme with object structure and color analysis

  • Conference Article
  • Cite Count Icon 1
  • 10.1109/icosst48232.2019.9043975
Object Segmentation in Video Sequences by using Single Frame Processing
  • Dec 1, 2019
  • Muhammad Hamza Bhatti + 2 more

Object segmentation, detection and tracking in videos is one of the most important task of computer vision. It is necessary in all of the real time deployed surveillance systems. Various unsupervised and semi-supervised video object segmentation techniques have been implemented and shown efficient results. But all of these techniques process all of the frames of a video sequence, which requires a huge training data and results in a large computational time. In this paper, a semi-supervised technique is proposed which segments an object in a video by just processing a single frame of the sequence. In this framework, a fully convolutional network is used to separate the foreground from the image, create the mask of the object and then segments the object with the help of this mask. The foreground separation in a frame is done by using pre-trained network while, training and testing of rest of the network is done using a specified dataset named as DAVIS. The results show that, the proposed framework takes less computational time and has also improved the overall accuracy of video object segmentation by 10% as compared to previous techniques.

  • Conference Article
  • Cite Count Icon 9
  • 10.1145/1180639.1180805
Video object segmentation by motion-based sequential feature clustering
  • Oct 23, 2006
  • Mei Han + 2 more

Segmentation of video foreground objects from background has many important applications, such as human computer interaction, video compression, multimedia content editing and manipulation. Most existing methods work on image pixels or color segments which are computationally expensive. Some methods require extensive manual inputs, static cameras, and/or rigid scenes. In this paper we propose a fully automatic foreground segmentation method based on sequential clustering of sparse image features. The sparseness makes the method computationally efficient. We use both edge and corner points extracted from each video frame. A joint spatio-temporal linear regression method is developed to compute sparse motion layers of M consecutive frames jointly under the temporal consistency constraint. Once the sparse motion layers have been identified for each frame, the corresponding dense motion layers are created using the Markov Random Field (MRF) model. The MRF model assigns the rest of the image pixels to the motion layers by considering both the color attributes and the spatial relations between each pixel and its surrounding edge/corner points. Experimental evaluations on videos taken by webcams show the effectiveness of the proposed method.

  • Conference Article
  • Cite Count Icon 43
  • 10.1109/iccv.2017.544
Unsupervised Object Segmentation in Video by Efficient Selection of Highly Probable Positive Features
  • Oct 1, 2017
  • Emanuela Haller + 1 more

We address an essential problem in computer vision, that of unsupervised foreground object segmentation in video, where a main object of interest in a video sequence should be automatically separated from its background. An efficient solution to this task would enable large-scale video interpretation at a high semantic level in the absence of the costly manual labeling. We propose an efficient unsupervised method for generating foreground object soft masks based on automatic selection and learning from highly probable positive features. We show that such features can be selected efficiently by taking into consideration the spatio-temporal appearance and motion consistency of the object in the video sequence. We also emphasize the role of the contrasting properties between the foreground object and its background. Our model is created over several stages: we start from pixel level analysis and move to descriptors that consider information over groups of pixels combined with efficient motion analysis. We also prove theoretical properties of our unsupervised learning method, which under some mild constraints is guaranteed to learn the correct classifier even in the unsupervised case. We achieve competitive and even state of the art results on the challenging Youtube-Objects and SegTrack datasets, while being at least one order of magnitude faster than the competition. We believe that the strong performance of our method, along with its theoretical properties, constitute a solid step towards solving unsupervised discovery in video.

  • Conference Article
  • Cite Count Icon 1
  • 10.1109/wcica.2008.4594556
Video Object Segmentation Based on Multi-Feature Clustering
  • Jun 1, 2008
  • Shuangyan Hu + 3 more

As a requisite of the emerging content-based multimedia technologies, video object segmentation is of great importance. This paper proposed a method of video object segmentation based on multi-feature clustering. At first, gain the twice-difference image from the three successive video frames. Then, eliminate the noise of background with the estimation of the feature parameter and extract the video object motion area. Afterward, employ the improved FCM clustering method to segment the motion area and get the video object mask by processing the previous result with morphological method. Finally, acquire the ideal video object. Experimental results show that the proposed method performs excellently for video object segmentation and outperforms the method of literature in spatial accuracy.

  • Research Article
  • Cite Count Icon 2
  • 10.1109/access.2022.3178609
Spatiotemporal Activity Semantics Understanding Based on Foreground Object Segmentation: iCounter Scenario
  • Jan 1, 2022
  • IEEE Access
  • Tzu-Wei Yu + 5 more

Foreground object segmentation that captures the spatial and temporal information of moving objects in video is the most fundamental task for activity understanding in many intelligent applications, such as smart stores. Recently, several methods are proposed for the detection and recognition of activity based on object segmentation. However, these methods are often inaccurate because they do not maintain the temporal associations of object segment consistency across time. In this work, we proposed a hierarchical approach for foreground object segmentation and activity semantics understanding from sequential video to preserve spatial and temporal connectivity in the frames. The proposed system consists of two main modules: (a) the concatenated deep learning network containing PSPNet and convolutional-GRU to segment the foreground of an object of interest; (b) the activity mining framework which incorporates three sub-modules (i) a RetinaNet-based frame classifier to detect and count objects of interest; (ii) a time-domain activity and event detection algorithm; (iii) an image-based item query engine to recognize the shopping items. To evaluate the proposed approach, we designed the smart checkout-box called iCounter to collect the shopping activities dataset named "NOL-41" which is used in extensive experiments. The results show that the accuracy of the foreground object segmentation is 90.6%, the accuracy of the frame classification is 93.4%, the accuracy of activity event detection is 98.4%, and the accuracy of item query is 94.3%. Finally, the overall accuracy of the shopping list is 95.2%.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant