An automatic object-oriented video segmentation and representation algorithm is proposed, where the local variance contrast and the frame difference contrast are jointly exploited for meaningful moving object extraction because these two visual features can indicate the spatial homogeneity of the gray levels and the temporal coherence of the motion fields efficiently. The 2-D entropic thresholding technique and the watershed transformation method are further developed to determine the global feature thresholds adaptively according to the variation of the video components. The obtained video components are first represented by a group of 4x4 blocks coarsely, and then the meaningful moving objects are generated by an iterative region-merging procedure according to the spatiotemporal similarity measure. The temporal tracking procedure is further proposed to obtain more semantic moving objects and to establish the correspondence of the moving objects among frames. Therefore, the proposed automatic moving object extraction algorithm can detect the appearance of new objects as well as the disappearance of existing objects efficiently because the correspondence of the video objects among frames is also established. Moreover, an object-oriented video representation and indexing approach is suggested, where both the operation of the camera (i.e., change of the viewpoint) and the birth or death of the individual objects are exploited to detect the breakpoints of the video data and to select the key frames adaptively.
Read full abstract