Object Segmentation in Video Sequences by using Single Frame Processing
Object segmentation, detection and tracking in videos is one of the most important task of computer vision. It is necessary in all of the real time deployed surveillance systems. Various unsupervised and semi-supervised video object segmentation techniques have been implemented and shown efficient results. But all of these techniques process all of the frames of a video sequence, which requires a huge training data and results in a large computational time. In this paper, a semi-supervised technique is proposed which segments an object in a video by just processing a single frame of the sequence. In this framework, a fully convolutional network is used to separate the foreground from the image, create the mask of the object and then segments the object with the help of this mask. The foreground separation in a frame is done by using pre-trained network while, training and testing of rest of the network is done using a specified dataset named as DAVIS. The results show that, the proposed framework takes less computational time and has also improved the overall accuracy of video object segmentation by 10% as compared to previous techniques.
- Conference Article
34
- 10.1109/wacv56688.2023.00172
- Jan 1, 2023
Multiple existing benchmarks involve tracking and segmenting objects in video e.g., Video Object Segmentation (VOS) and Multi-Object Tracking and Segmentation (MOTS), but there is little interaction between them due to the use of disparate benchmark datasets and metrics (e.g. $\mathcal{J}\& {\mathcal{F}}$, mAP, sMOTSA). As a result, published works usually target a particular benchmark, and are not easily comparable to each another. We believe that the development of generalized methods that can tackle multiple tasks requires greater cohesion among these research sub-communities. In this paper, we aim to facilitate this by proposing BURST, a dataset which contains thousands of diverse videos with high-quality object masks, and an associated benchmark with six tasks involving object tracking and segmentation in video. All tasks are evaluated using the same data and comparable metrics, which enables researchers to consider them in unison, and hence, more effectively pool knowledge from different methods across different tasks. Additionally, we demonstrate several baselines for all tasks and show that approaches for one task can be applied to another with a quantifiable and explainable performance difference. Dataset annotations are available at: https://github.com/Ali2500/BURST-benchmark.
- Conference Article
9
- 10.1109/icmlc.2008.4620823
- Jul 1, 2008
As a critical step in many multimedia applications, shot boundary detection has attracted many research interests in recent years. The most of existing methods measure the similarity among video frames based on its low-level feathers. However, they are sensitive to the change in not only brightness, color, motion of object, but also camera motions and the quality of video. This paper proposes an innovative shot boundary detection method for news video based on video object segmentation and tracking. It combines three main techniques: the partitioned histogram comparison method, the video object segmentation and tracking based on wavelet analysis. The partitioned histogram comparison is used as the first filter to effectively reduce the number of video frames which need object segmentation and tracking. The unsupervised video object segmentation and tracking based on wavelet analysis is robust to those problems mentioned above. The efficacy of the proposed method is extensively tested with more than 3 hours of CCTV and CNN news programs, and that 96.4% recall with 97.2% precision have been achieved.
- Research Article
6
- 10.1016/j.image.2020.115858
- Apr 20, 2020
- Signal Processing: Image Communication
Video object tracking and segmentation with box annotation
- Conference Article
21
- 10.5220/0001374604740479
- Jan 1, 2006
Object segmentation in a video sequence is an essential task in video processing and forms the foundation of content analysis, scene understanding, object-based video encoding (e.g. MPEG-4), various surveillance and 2D-to-pseudo-3D conversion applications. Popularization and availability of video sequences with increased spatial resolution requires development of new, more efficient algorithms for object detection and segmentation. This dissertation discusses a novel neural-network-based approach to background modeling for motion-based object segmentation in video sequences. In particular, we show how Probabilistic Neural Network (PNN) architecture can be extended to form an unsupervised Bayesian classifier for the domain of video object segmentation. The constructed Background Modeling Neural Network (BNN) is capable of efficiently handling segmentation in natural-scene sequences with complex background motion and changes in illumination. The weights of the proposed neural network serve as an exclusive model of the background and are temporally updated to reflect the observed background statistics. The proposed approach is designed to enable an efficient, highly-parallelized hardware implementation. Such a system would be able to achieve real-time segmentation of high-resolution image sequences.
- Research Article
33
- 10.1109/tcsvt.2013.2242595
- Jun 1, 2013
- IEEE Transactions on Circuits and Systems for Video Technology
Video object segmentation and tracking are two essential building blocks of smart surveillance systems. However, there are several issues that need to be resolved. Threshold decision is a difficult problem for video object segmentation with a multi-background model. In addition, some conditions make robust video object tracking difficult. These conditions include nonrigid object motion, target appearance variations due to changes in illumination, and background clutter. In this paper, a video object segmentation and tracking framework is proposed for smart cameras in visual surveillance networks with two major contributions. First, we propose a robust threshold decision algorithm for video object segmentation with a multi-background model. Second, we propose a video object tracking framework based on a particle filter with the likelihood function composed of diffusion distance for measuring color histogram similarity and motion clue from video object segmentation. The proposed framework can track nonrigid moving objects under drastic changes in illumination and background clutter. Experimental results show that the presented algorithms perform well for several challenging sequences, and our proposed methods are effective for the aforementioned issues.
- Conference Article
5
- 10.1109/icmlc.2005.1527816
- Jan 1, 2005
Moving object segmentation and tracking in video is an important task not only in computer motion detection and tracking, but also in MPEG-4. A new moving object segmentation and tracking method based on the improved PCA is presented in this paper. Firstly, the improved PCA is used to segment the moving object in the original image sequence. In this step, three frames are enough for the segmentation of rigid and non-rigid moving object from background. Secondly, tracking is performed by shifting the 3 frame window along the image sequence and repeating the first step in each window.
- Research Article
1
- 10.34028/iajit/22/1/3
- Jan 1, 2025
- The International Arab Journal of Information Technology
In the field of actual Video Object Segmentation (VOS), traditional techniques have poor adaptability and insufficient segmentation results. Therefore, based on existing problems, an Unsupervised Video Object Segmentation (UVOS) technique based on convolutional networks is proposed. Firstly, the method of decomposing expressions is used to handle the spatiotemporal relationship between the reference frame and the target frame, and video object reconstruction is achieved through similarity calculation. For target segmentation in motion scenes, a Single Linear Bottleneck Operator (SLBO) is introduced for feature extraction, and pooling compensation is used to optimize feature information loss. For general scene segmentation, a spatiotemporal similarity segmentation technique is introduced to achieve target video segmentation for complex scenes. In the foreground segmentation test of sports scenes, the Change Detection Benchmark Dataset 2014 (CDNet.20I4SM) dataset was selected to test the model's loss performance in different scenarios. In adverse weather scenario training, the proposed model tends to converge after 40 iterations, with a loss value of 0.276, which is superior to the Foreground image Segmentation (FgSegNet_), the Convolutional Networks for Biomedical Image Segmentation (MU Net), Cascade Convolutional Neural Network (Cascade CNN) models; In the accuracy test, the proposed FS-LBPC model tended to converge after 50 iterations, with a precision P-value of 0.963. It performed the best among the four segmentation models the FgSegNet_, MU Net, Cascade CNN, and a real-time Foreground Segmentation network based on single Linear Bottleneck and Pooling Compensation (FS-LBPC). Usually, the Densely Annotated VIdeo Segmentation (DAVIS16) dataset is selected for video scene segmentation, which has the best segmentation performance in horse racing and animal flight scenes, with segmentation accuracy of 0.976 and 0.965, respectively. In summary, the VOS technology has excellent application effects in practical scenarios, providing important technical references for the improvement of image and video processing and segmentation technology
- Research Article
23
- 10.1016/j.imavis.2013.07.008
- Aug 7, 2013
- Image and Vision Computing
Integrating tracking with fine object segmentation
- Conference Article
1
- 10.1109/icspcc46631.2019.8960816
- Sep 1, 2019
Object segmentation in videos has been extensively investigated recent years. However, semi-supervised object segmentation in videos is still a challenging research topic as it is hard to modeling temporal information. Most of research treats video frames independence and lost the relationship between adjacent frames. To overcome the limitation, Semi-supervised Video Object Segmentation with Recurrent Neural Network (SVOSR) has been proposed which combines convolutional gated recurrent unit (ConvGRU) to learn the temporal information between adjacent frames. The proposed method can be treated as three main parts. First, the feature extraction part is proposed to generate spatial information from adjacent frames. Second the relation part extracts temporal information from the adjacent spatial information. Thirdly, the decoder part combines the spatiotemporal information and inference the results. We put forward the relation part and design the decoder part to better segmentation. Experiments show that our method shows achievable accuracy and has the order of magnitude faster inference time compared with OSVOS and other methods based on DAVIS dataset.
- Research Article
3
- 10.3233/fi-2009-0025
- Jan 1, 2009
- Fundamenta Informaticae
This paper proposes a real-time scheme for object segmentation in video. In the first stage a segmentation based on pairwise region comparison is utilized to oversegment image through extracting superpixels. Next, the algorithmapplies the graph cut built on such superpixels, instead of the image pixels. Owing to the optimization is performed on a simpler graph and in consequence the object segmentation runs in shorter time. Tracking of object features over time contributes toward improved segmenting the object from one image to another. The segmentation information supports following the entire object, instead of just a few features on it. The objects are segmented correctly as complete entities, despite the high variability of the object shape and cluttered background. Experimental results illustrate the efficiency and effectiveness of the algorithm.
- Book Chapter
3
- 10.1007/978-3-319-24947-6_25
- Jan 1, 2015
In this paper we propose a method for foreground object segmentation in videos using an improved version of the GrabCut algorithm. Motivated by applications in de-identification, we consider a static camera scenario and take into account common problems with the original algorithm that can result in poor segmentation. Our improvements are as follows: (i) using background subtraction, we build GMM-based segmentation priors; (ii) in building foreground and background GMMs, the contributions of pixels are weighted depending on their distance from the boundary of the object prior; (iii) probabilities of pixels belonging to foreground or background are modified by taking into account the prior pixel classification as well as its estimated confidence; and (iv) the smoothness term of GrabCut is modified by discouraging boundaries further away from the object prior. We perform experiments on CDnet 2014 Pedestrian Dataset and show considerable improvements over a reference implementation of GrabCut.
- Research Article
28
- 10.1016/j.patrec.2004.07.009
- Sep 11, 2004
- Pattern Recognition Letters
Unsupervised video object segmentation and tracking based on new edge features
- Research Article
5
- 10.1007/s11042-019-07781-0
- Jun 24, 2019
- Multimedia Tools and Applications
Salient object segmentation in videos is generally broken up in a video segmentation part and a saliency assignment part. Recently, object proposals, which are used to segment the image, have had significant impact on many computer vision applications, including image segmentation, object detection, and recently saliency detection in still images. However, their usage has not yet been evaluated for salient object segmentation in videos. Therefore, in this paper, we investigate the application of object proposals to salient object segmentation in videos. In addition, we propose a new motion feature derived from the optical flow structure tensor for video saliency detection. Experiments on two standard benchmark datasets for video saliency show that the proposed motion feature improves saliency estimation results, and that object proposals are an efficient method for salient object segmentation. Results on the challenging SegTrack v2 and Fukuchi benchmark data sets show that we significantly outperform the state-of-the-art.
- Conference Article
65
- 10.1109/iscas.1997.622202
- Jun 9, 1997
Object segmentation and tracking is a key component for new generation of digital video representation, transmission and manipulations. Example applications include content based video database and video editing. We present a general schema for video object modeling, which incorporates low level visual features and hierarchical grouping. The schema provides a general framework for video object extraction, indexing, and classification. In addition, we present new video segmentation and tracking algorithms based on salient color and affine motion features. Color feature is used for intra frame segmentation; affine motion is used for tracking image segments over time. Experimental evaluation results using several test video streams are included.
- Conference Article
3
- 10.1109/nafips.2012.6291048
- Aug 1, 2012
In this paper a DTCNN model for dynamic object segmentation in videos is presented. The proposed method involves three main stages; dynamic background registration, dynamic objects detection and object segmentation improvement. Two DTCNNs are used, one to achieved object detection and other for morphologic operations in order to improve object segmentation. Visual and quantitative results are compared with findings of a Self-organizing map SOM-like dynamic object detection approach. Considering the experiments reported, it can be said that the proposed method shows acceptable results with some improvements over the SOM because the DTCNN method does not need human intervention for parameter adjustment.