Extraction of Video Objects via Surface Optimization and Voronoi Order
We implement a video object segmentation system that integrates the novel concept of Voronoi Order with existing surface optimization techniques to support the MPEG-4 functionality of object-addressable video content in the form of video objects. The major enabling technology for the MPEG-4 standard are systems that compute video object segmentation, i.e., the extraction of video objects from a given video sequence. Our surface optimization formulation describes the video object segmentation problem in the form of an energy function that integrates many visual processing techniques. By optimizing this surface, we balance visual information against predictions of models with a priori information and extract video objects from a video sequence. Since the global optimization of such an energy function is still an open problem, we use Voronoi Order to decompose our formulation into a tractable optimization via dynamic programming within an iterative framework. In conclusion, we show the results of the system on the MPEG-4 test sequences, introduce a novel objective measure, and compare results against those that are hand-segmented by the MPEG-4 committee.
- Research Article
5
- 10.1016/j.patcog.2008.02.007
- Feb 29, 2008
- Pattern Recognition
Applying the multi-category learning to multiple video object extraction
- Conference Article
1
- 10.1109/icassp.2006.1661424
- May 14, 2006
As a requisite of content-based multimedia technologies, video object (VO) extraction is of great importance. In recent years, approaches have been proposed to handle VO extraction directly as a classification problem. This type of methods calls for state-of-the-art classifiers because the extraction performance is directly related to the accuracy of classification. Promising results have been reported for single object extraction using support vector machines (SVM) and its extensions such as psi-learning. Multiple object extraction, on the other hand, still imposes great difficulty as multi-category classification is an ongoing research topic in machine learning. This paper introduces the newly developed multi-category psi-learning as the multiclass classifier for multiple VO extraction, and demonstrates its effectiveness and advantages by experiments
- Research Article
311
- 10.1109/76.988659
- Jan 1, 2002
- IEEE Transactions on Circuits and Systems for Video Technology
The new video-coding standard MPEG-4 enables content-based functionality, as well as high coding efficiency, by taking into account shape information of moving objects. A novel algorithm for segmentation of moving objects in video sequences and extraction of video object planes (VOPs) is proposed . For the case of multiple video objects in a scene, the extraction of a specific single video object (VO) based on connected components analysis and smoothness of VO displacement in successive frames is also discussed. Our algorithm begins with a robust double-edge map derived from the difference between two successive frames. After removing edge points which belong to the previous frame, the remaining edge map, moving edge (ME), is used to extract the VOP. The proposed algorithm is evaluated on an indoor sequence captured by a low-end camera as well as MPEG-4 test sequences and produces promising results.
- Conference Article
3
- 10.1109/cyber.2011.6011783
- Mar 1, 2011
Fast and effective indexing and retrieval from large amount of surveillance videos are very important issues. This paper proposes a novel object-semantic-based surveillance video indexing and retrieval system, which is mainly composed of two modules: video analysis and video retrieval. In the video analysis, the systems first segments video objects (VO) from surveillance videos, and the fundamental semantic information is then extracted and indexed into the database. A normal approach of Gaussian Mixed Model (GMM) is applied in video object extraction (VOE) and video object segmentation (VOS). During retrieval, the query is converted to semantic information without re-processing the surveillance videos. Color, edge orientation histograms and SIFT (Scale Invariant Feature Transforms), as the key features and similarity measurement, are considered together to accurately match the video objects (VOM). The experiment shows that a user can retrieve the required videos effectively.
- Research Article
- 10.56726/irjmets48847
- Feb 3, 2024
- International Research Journal of Modernization in Engineering Technology and Science
In contemporary computer vision, the extraction of video objects plays a crucial role in various applications, including surveillance, autonomous vehicles, and augmented reality.This research introduces a spatially adaptive attention mechanism in conjunction with Convolutional Neural Networks (CNNs) to enhance the effectiveness of video object extraction.The proposed mechanism strategically allocates attention to pertinent spatial regions, enabling the CNN to dynamically adjust its focus during video processing.By capitalizing on the inherent spatial characteristics of objects in video sequences, the model achieves heightened accuracy and efficiency in identifying and extracting objects of interest.In experimental and result analysis, the findings showcase the superior performance of the spatially adaptive attention mechanism compared to traditional methods.The integration of this mechanism into CNNs demonstrates promising outcomes, enhancing the precision and robustness of video object extraction.After object extraction, Mask R-CNN is used for video instance segmentation.This research contributes to the progression of video analysis techniques, opening avenues for more effective applications across diverse domains.The spatially adaptive attention mechanism offers a nuanced solution to the challenges associated with video object extraction, representing a noteworthy advancement in the field of computer vision.
- Research Article
- 10.4028/www.scientific.net/amm.303-306.2254
- Feb 1, 2013
- Applied Mechanics and Materials
This paper presents an algorithm about the extraction of left channel video object which based on high-order statistical change detection and the segmentation of right channel video object based on parallax matching. This algorithm combines the advantages of disparity map segmentation and multiple frame difference motion segmentation. First of all, through the segmentation of the parallax matching video objects in the right channel, we can get primary partition templates in different layers of parallax targets; Secondary with high-order statistical change detection, we can extract video objects in the left channel from templates. Finally we obtain the accurate moving target. Based on 3D multi-view video segmentation, we use H.264-based method to encode the main image flow and then get object-based 3D video coding
- Research Article
- 10.1007/s11767-006-0150-1
- Sep 1, 2007
- Journal of Electronics (China)
Video object extraction is a key technology in content-based video coding. A novel video object extracting algorithm by two Dimensional (2-D) mesh-based motion analysis is proposed in this paper. Firstly, a 2-D mesh fitting the original frame image is obtained via feature detection algorithm. Then, higher order statistics motion analysis is applied on the 2-D mesh representation to get an initial motion detection mask. After post-processing, the final segmenting mask is quickly obtained. And hence the video object is effectively extracted. Experimental results show that the proposed algorithm combines the merits of mesh-based segmenting algorithms and pixel-based segmenting algorithms, and hereby achieves satisfactory subjective and objective performance while dramatically increasing the segmenting speed.
- Conference Article
- 10.1109/icosc.2007.4338416
- Sep 1, 2007
Video object extraction is a key technology in the content-based video coding. A new temporal-spatial video object segmentation algorithm is proposed in this paper. This algorithm is based on 2-D mesh-based motion analysis according to motion connectivity. Firstly, a 2-D adaptive mesh fitting the original frame image is obtained via feature detection algorithm. Then, higher order statistics motion analysis according to motion connectivity is applied on the 2-D mesh representation to get a coarse motion boundary layer. After refining the coarse boundary layer and post-processing the labeled maximal connected region, the final segmenting mask is quickly obtained. Hence the video object is effectively extracted. Experimental results show that the proposed algorithm combines the merits of the mesh-based segmentation algorithms and the pixel-based segmentation algorithms, and therefore achieves satisfactory subjective and objective performance as well as dramatically increasing the segmenting speed.
- Conference Article
- 10.1109/anthology.2013.6784883
- Jan 1, 2013
In this paper, in order to precisely extract video objects from complex background, a new video objects extraction algorithm based on fusion of temporal segmentation and spatial segmentation is proposed. The method can extract moving objects effectively under complex background and overcome the drawback of integrity of the extracted moving objects in the condition of similarity of the foreground color and background color. First, in temporal space, the paper adopted the background model and improved background difference method to extract complete moving objects area. Subsequently, in spatial space, to get precise segmented edge image, the paper used the algorithm of adaptive meanshift for color image segmentation. Finally, the author made some improvement in the fusion operation of the extracted moving objects area and the segmented edge image, the method extracted video objects with precise edge. Experimental results show the proposed algorithm above can segment the accurate and complete video objects in video sequence with complex background.
- Conference Article
1
- 10.1109/iccse.2010.5593831
- Aug 1, 2010
In video processing, Automatic video object segmentation is an important and difficult problem, we propose an effective algorithm of automatic video object segmentation based on temporal-spatial information. Firstly, we obtain video object mask based on temporal, position and motional information, then correct the contour of the obtained video object mask based on spatial information. We construct the background of video sequence and obtain the video object based on the corrected video object mask. The extraction and tracking of video object are completed by employing the information of background.
- Research Article
26
- 10.1023/a:1011115312953
- Aug 1, 2001
- Journal of VLSI signal processing systems for signal, image and video technology
In this paper, we propose two novel video object (VO) extraction schemes, specifically designed for two different scenarios of content-based video analysis applications. One is a change detection-based VO extraction algorithm appropriate to surveillance type video sequences, where automatic detection of new appearance of objects are important in envisaging on-line object-oriented applications as well as object-based coding. The other is an object tracking-based method, which is especially robust to video sequences with moving background, although human intervention is needed in the process. In both cases, the semantically meaningful video objects are obtained by a final regularization stage realized by means of a cascade of morphological filters. Experimental results obtained on the MPEG-4 test sequences are presented respectively.
- Research Article
27
- 10.1109/tcsvt.2005.848346
- Jul 1, 2005
- IEEE Transactions on Circuits and Systems for Video Technology
As a requisite of the emerging content-based multimedia technologies, video object (VO) extraction is of great importance. This paper presents a novel semiautomatic segmentation and tracking method for single VO extraction. Unlike traditional approaches, the proposed method formulates the separation of the VO from the background as a classification problem. Each frame is divided into small blocks of uniform size, which are called object blocks if the centering pixels belong to the object, or background blocks otherwise. After a manual segmentation of the first frame, the blocks of this frame are used as the training samples for the object-background classifier. A newly developed learning tool called /spl psi/-learning is employed to train the classifier which outperforms the conventional Support Vector Machines in linearly nonseparable cases. To deal with large and complex objects, a multilayer approach constructing a so-called hyperplane tree is proposed. Each node of the tree represents a hyperplane, responsible for classifying only a subset of the training samples. Multiple hyperplanes are thus needed to classify the entire set. Through the combination of the multilayer scheme and /spl psi/-learning, one can avoid the complexity of nonlinear mapping as well as achieve high classification accuracy. During the tracking phase, the pixel in the center of every block in a successive frame is classified by a sequence of hyperplanes from the root to a leaf node of the hyperplane tree, and the class of the block is identified accordingly. All the object blocks thus form the object of interest, whose boundary unfortunately is stair-like due to the block effect. In order to obtain the pixel-wise boundary in a cost efficient way, a pyramid boundary refining algorithm is designed, which iteratively selects a few informative pixels for class label checking, and reduces uncertainty about the actual boundary of the object. The proposed method has been applied on video sequences with various spatial and temporal characteristics, and experimental results demonstrate it to be effective, efficient, and robust.
- Conference Article
- 10.1109/crv.2006.53
- Jun 7, 2006
A new method of video object extraction is proposed to accurately obtain the object of interest from actively acquired videos. Traditional video object extraction techniques often operate under the assumption of homogeneous object motion and extract various parts of the video that are motion consistent as objects. In contrast, the proposed active video object extraction (AVOE) paradigm assumes that the object of interest is being actively tracked by a non-calibrated camera under general motion and classifies the possible movements of the camera that result in the 2D motion patterns as recovered from the image sequence. Consequently, the AVOE method is able to extract the single object of interest from the active video. We formalize the AVOE process using notions from Gestalt psychology. We define a new Gestalt factor called shift and hold and present 2D object extraction algorithms. Moreover, since an active video sequence naturally contains multiple views of the object of interest, we demonstrate that these views can be combined to form a single 3D object regardless of whether the object is static or moving in the video.
- Research Article
19
- 10.1016/j.patcog.2007.07.015
- Jul 28, 2007
- Pattern Recognition
Automatic object extraction and reconstruction in active video
- Research Article
- 10.14419/ijet.v7i4.17610
- Sep 24, 2018
- International Journal of Engineering & Technology
Video object extraction (VOE) using segmentation from a video sequence is a very important task in editing and multimedia analysis for film making. Most of the VOE approaches required prior knowledge about background and foreground to extract target objects. In this paper, an Optimized smoothed Dirichlet Process Multi-view learning with improved adaptive Modified Markov Random Field which is enhanced by adaptive shape prior modified graph cut (OsDPMVL-IASMMRF) model has been extended for video-based object extraction. The contour tracking has been additionally included OsDPMVL-IASMMRF for VOE. The Teh–Chin algorithm has been used with OsDPMVL-IASMMRF for predicting the contour in the current frame by matching the extracted object contour from the previous segmented frame. The contour tracking propagates the shape of the target object, whereas the OsDPMVL-IASMMRF segmentation refined the object boundary and the shape for enhancing the accuracy of video segmentation. The experimental outcomes show that the proposed approach provides better segmentation results in terms of accuracy, precision and recall.