Abstract
We propose a time-consistent video segmentation algorithm designed for real-time implementation. Our algorithm is based on a region merging process that combines both spatial and motion information. The spatial segmentation takes benefit of an adaptive decision rule and a specific order of merging. Our method has proven to be efficient for the segmentation of natural images with few parameters to be set. Temporal consistency of the segmentation is ensured by incorporating motion information through the use of an improved change-detection mask. This mask is designed using both illumination differences between frames and region segmentation of the previous frame. By considering both pixel and region levels, we obtain a particularly efficient algorithm at a low computational cost, allowing its implementation in real-time on the TriMedia processor for CIF image sequences.
Highlights
The segmentation of each frame of a video into homogeneous regions is an important issue for many video applications such as region-based motion estimation, image enhancement, 2D to 3D conversion
We propose to take benefit of scene-change detection, that is widely used in video segmentation [22,23,24], rather than motion estimation that remains a real bottleneck for real-time implementation
To reduce the data cache stalls cycles, we investigate some optimisations that are detailed and we take benefit of the TriMedia processor to exploit the data level parallelism (DLP) and instruction level parallelism (ILP) of our algorithm
Summary
The segmentation of each frame of a video into homogeneous regions is an important issue for many video applications such as region-based motion estimation, image enhancement (since different processing may be applied on different regions), 2D to 3D conversion. The closure of the edges in order to create connected regions is a difficult task and an efficient resolution of such a problem may induce cumbersome computations Such an approach cannot take benefit of statistical properties of the considered image regions. In the context of a real-time implementation, their merging predicate still requires too many computations Their algorithm is dedicated to the segmentation of still images and so, it does not take into account the temporal dimension of video sequences. In our approach, each pixel is modelled as a single random variable (in [4], the authors model each pixel as a sum of M random variables) This method gives a simpler predicate that is more adapted to real-time implementation.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have