Abstract

We present an accurate and user-interactive semantic video object (SVO) extraction system. Although we also obtain an SVO with an accurate boundary by integrating temporal and spatial information, our way is quite different from others' work. Instead of fusing spatial and temporal segmentations on the first or all the frames of a video sequence, our system adaptively performs spatial and temporal segmentation and fusion when necessary. To achieve this, our system detects the variations between successive frames. We only need to fuse the spatial and temporal segmentation when a large variation occurs. Otherwise, the system tracks the previous SVO's boundary. We find this simple method efficient in both speed and accuracy. Since the temporal segmentation, spatial segmentation, spatio-temporal fusion, and boundary tracking all employ simple algorithms, our system has a low computational complexity.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call